Vali User Guideο
Overviewο
Vali is an academic-only pilot cluster designed exclusively for instructional use and student-led research projects. It serves as a learning and training platform, not a production-grade computing resource. The system operates on repurposed hardware, and therefore offers no warranty or performance guarantees. Uptime and stability may vary, and users should expect occasional interruptions as part of its pilot nature. Support for Vali is provided only during standard business hours; no after-hours assistance is available.
Important
All course-related data stored on the system will be cleared at the end of each semester, so users must ensure their work is backed up externally to prevent data loss.
Accessο
Users can connect to the following nodes depending on their task.
Batch Submission:
> ssh pparker@acad-hpcsub.augusta.edu
Interactive/Software Builds:
> ssh pparker@acad-hpcbld.augusta.edu
Data Transfer:
> ssh pparker@acad-hpcxfer.augusta.edu
File Systemsο
Users of Vali have access to two storage areas:
HOME directory
/home- Persistent user storage for job scripts and small files.LOCAL SCRATCH file system
/lscratch- Temporary storage local to compute nodes, wiped after job completion or node reboot.
Batch Submissionο
To submit a batch job, follow these steps:
1. Login to the cluster:
> ssh tstark@acad-hpcsub.augusta.edu
Example output:
[tstark@Vali_batch_submission ~]$
2. Check system status:
[tstark@Vali_batch_submission ~]$ sinfo
[tstark@Vali_batch_submission ~]$ squeue
Example output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
121277 cpu_middl cosmx-RN tsark R 17:11:00 1 cnode019
121294 cpu_norma 1DLN_GEX bbanner R 4:21:30 1 cnode001
To check your jobs specifically:
[tstark@Vali_batch_submission ~]$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
121322 cpu_norma bash tsark R 0:58 1 cnode016
To view job details:
[tstark@Vali_batch_submission ~]$ scontrol show job 121322
To check job resource consumption:
While job is running:
[tstark@Vali_batch_submission ~]$ sstat -j 121322.batch --format=JobID,MaxRSS,AveCPU,Elapsed
After the job is done:
[tstark@Vali_batch_submission ~]$ sacct -j 121322 --format=JobID,JobName,MaxRSS,Elapsed,State
3. Create a Python script (batch_permutations.py)
[tstark@Vali_batch_submission ~]$ vi batch_permutations.py
1# import the library
2import itertools
3
4# A small list of numbers
5numbers = [1, 2, 3, 4]
6
7# Generate all 2-number permutations
8perms = list(itertools.permutations(numbers, 2))
9
10# Calculate the sum of each permutation
11sums = [sum(p) for p in perms]
12
13# Print results
14print("Permutations of 2 numbers:", perms)
15print("Sums of each permutation:", sums)
16print(f"Total number of permutations: {len(perms)}")
4. Create a batch submission script (submit_quick_job.sh)
[tstark@Vali_batch_submission ~]$ python batch_permutations.py
#!/bin/bash
#SBATCH --job-name=perm_job # Job name
#SBATCH --output=perm_output_%j.txt # Output log file
#SBATCH --ntasks=1 # Number of tasks
#SBATCH --cpus-per-task=1 # CPU cores per task
#SBATCH --mem=1G # Minimal memory
#SBATCH --time=00:01:00 # 1-minute runtime
#SBATCH --partition=cpu_normal_q # SLURM partition
# Load Python
module load Python/3.10.8-GCCcore-12.2.0
# Run the Python script
python batch_permutations.py
5. Submit the job:
[tstark@Vali_batch_submission ~]$ sbatch submit_quick_job.sh
[tstark@Vali_batch_submission ~]$ squeue -u $USER
6. Check files generated:
[tstark@Vali_batch_submission ~]$ ll
Example output:
-rw-r--r-- 1 tstark auhpcs_aveng_g 683 Mar 13 13:28 batch_permutations.py
-rw-r--r-- 1 tstark auhpcs_aveng_g 0 Mar 13 09:16 perm_output_121394.txt
-rw-r--r-- 1 tstark auhpcs_aveng_g 835 Mar 12 15:07 submit_quick_job.sh
[tstark@Vali_batch_submission ~]$ cat perm_output_121394.txt
Example output:
Permutations of 2 numbers: [(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4), (3, 1), (3, 2), (3, 4), (4, 1), (4, 2), (4, 3)]
Sums of each permutation: [3, 4, 5, 3, 5, 6, 4, 5, 7, 5, 6, 7]
Total number of permutations: 12
Interactive Job Executionο
1. Login to the interactive node:
> ssh bbanner@acad-hpcbld.augusta.edu
2. Request interactive resources:
[bbanner@Vali_inter_submission ~]$ salloc --nodes=1 --ntasks=1 --cpus-per-task=2 --time=02:00:00 --partition=interactive_q
Example output:
salloc: Granted job allocation 121322
[bbanner@Vali_inter_submission ~]$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
121322 interactive bash bbanner R 2:00 1 cnode016
3. Start an interactive shell on the allocated compute node:
[bbanner@Vali_inter_submission ~]$ srun --jobid 121322 --pty /bin/bash -i
Slurm launches a process on your allocated compute node using SSH or its internal daemons. Now youβre inside your allocated compute node session.
4. Load necessary modules:
[bbanner@cnode016 ~]$ module list
[bbanner@cnode016 ~]$ module avail python
[bbanner@cnode016 ~]$ module load Python/3.10.8-GCCcore-12.2.0
5. Create an interactive Python script (interactive_permutations.py)
[bbanner@cnode016 ~]$ vi interactive_permutations.py
6. Run the Python script:
[bbanner@cnode016 ~]$ python interactive_permutations.py
Example output:
Permutations of 2 numbers: [(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4), (3, 1), (3, 2), (3, 4), (4, 1), (4, 2), (4, 3)]
Sums of each permutation: [3, 4, 5, 3, 5, 6, 4, 5, 7, 5, 6, 7]
Total number of permutations: 12
7. Handling Job Expiry:
If a job exceeds its time limit, SLURM will revoke the allocation.
[bbanner@cnode016 ~]$ salloc: Job 121322 has exceeded its time limit and its allocation has been revoked.
8. To log out of the interactive session:
[bbanner@cnode016 ~]$ exit
Note
Remember to back up all files externally. /lscratch is cleared after every job or node reboot.