Vali User Guide

Overview

Vali is an academic-only pilot cluster designed exclusively for instructional use and student-led research projects. It serves as a learning and training platform, not a production-grade computing resource. The system operates on repurposed hardware, and therefore offers no warranty or performance guarantees. Uptime and stability may vary, and users should expect occasional interruptions as part of its pilot nature. Support for Vali is provided only during standard business hours; no after-hours assistance is available.

Important

All course-related data stored on the system will be cleared at the end of each semester, so users must ensure their work is backed up externally to prevent data loss.

Access

Users can connect to the following nodes depending on their task.

  • Batch Submission:

> ssh pparker@acad-hpcsub.augusta.edu
  • Interactive/Software Builds:

> ssh pparker@acad-hpcbld.augusta.edu
  • Data Transfer:

> ssh pparker@acad-hpcxfer.augusta.edu

File Systems

Users of Vali have access to two storage areas:

  • HOME directory /home - Persistent user storage for job scripts and small files.

  • LOCAL SCRATCH file system /lscratch - Temporary storage local to compute nodes, wiped after job completion or node reboot.

Batch Submission

To submit a batch job, follow these steps:

1. Login to the cluster:

> ssh tstark@acad-hpcsub.augusta.edu

Example output:

[tstark@Vali_batch_submission ~]$

2. Check system status:

[tstark@Vali_batch_submission ~]$ sinfo
[tstark@Vali_batch_submission ~]$ squeue

Example output:

JOBID    PARTITION   NAME       USER        ST     TIME      NODES  NODELIST(REASON)
121277  cpu_middl   cosmx-RN   tsark        R   17:11:00       1       cnode019
121294  cpu_norma   1DLN_GEX   bbanner      R    4:21:30       1       cnode001

To check your jobs specifically:

[tstark@Vali_batch_submission ~]$ squeue -u $USER
JOBID  PARTITION     NAME    USER      ST       TIME  NODES NODELIST(REASON)
121322 cpu_norma     bash    tsark     R       0:58      1     cnode016

To view job details:

[tstark@Vali_batch_submission ~]$ scontrol show job 121322

To check job resource consumption:

While job is running:

[tstark@Vali_batch_submission ~]$ sstat -j 121322.batch --format=JobID,MaxRSS,AveCPU,Elapsed

After the job is done:

[tstark@Vali_batch_submission ~]$ sacct -j 121322 --format=JobID,JobName,MaxRSS,Elapsed,State

3. Create a Python script (batch_permutations.py)

[tstark@Vali_batch_submission ~]$ vi batch_permutations.py
 1# import the library
 2import itertools
 3
 4# A small list of numbers
 5numbers = [1, 2, 3, 4]
 6
 7# Generate all 2-number permutations
 8perms = list(itertools.permutations(numbers, 2))
 9
10# Calculate the sum of each permutation
11sums = [sum(p) for p in perms]
12
13# Print results
14print("Permutations of 2 numbers:", perms)
15print("Sums of each permutation:", sums)
16print(f"Total number of permutations: {len(perms)}")

4. Create a batch submission script (submit_quick_job.sh)

[tstark@Vali_batch_submission ~]$ python batch_permutations.py
#!/bin/bash
#SBATCH --job-name=perm_job             # Job name
#SBATCH --output=perm_output_%j.txt     # Output log file
#SBATCH --ntasks=1                      # Number of tasks
#SBATCH --cpus-per-task=1               # CPU cores per task
#SBATCH --mem=1G                        # Minimal memory
#SBATCH --time=00:01:00                 # 1-minute runtime
#SBATCH --partition=cpu_normal_q        # SLURM partition

# Load Python
module load Python/3.10.8-GCCcore-12.2.0

# Run the Python script
python batch_permutations.py

5. Submit the job:

[tstark@Vali_batch_submission ~]$ sbatch submit_quick_job.sh
[tstark@Vali_batch_submission ~]$ squeue -u $USER

6. Check files generated:

[tstark@Vali_batch_submission ~]$ ll

Example output:

-rw-r--r-- 1 tstark auhpcs_aveng_g  683 Mar 13 13:28 batch_permutations.py
-rw-r--r-- 1 tstark auhpcs_aveng_g    0 Mar 13 09:16 perm_output_121394.txt
-rw-r--r-- 1 tstark auhpcs_aveng_g  835 Mar 12 15:07 submit_quick_job.sh
[tstark@Vali_batch_submission ~]$ cat perm_output_121394.txt

Example output:

Permutations of 2 numbers: [(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4), (3, 1), (3, 2), (3, 4), (4, 1), (4, 2), (4, 3)]
Sums of each permutation: [3, 4, 5, 3, 5, 6, 4, 5, 7, 5, 6, 7]
Total number of permutations: 12

Interactive Job Execution

1. Login to the interactive node:

> ssh bbanner@acad-hpcbld.augusta.edu

2. Request interactive resources:

[bbanner@Vali_inter_submission ~]$ salloc --nodes=1 --ntasks=1 --cpus-per-task=2 --time=02:00:00 --partition=interactive_q

Example output:

salloc: Granted job allocation 121322

[bbanner@Vali_inter_submission ~]$ squeue -u $USER
JOBID   PARTITION     NAME    USER      ST       TIME  NODES NODELIST(REASON)
121322 interactive     bash   bbanner    R       2:00      1    cnode016

3. Start an interactive shell on the allocated compute node:

[bbanner@Vali_inter_submission ~]$ srun --jobid 121322 --pty /bin/bash -i

Slurm launches a process on your allocated compute node using SSH or its internal daemons. Now you’re inside your allocated compute node session.

4. Load necessary modules:

[bbanner@cnode016 ~]$ module list
[bbanner@cnode016 ~]$ module avail python
[bbanner@cnode016 ~]$ module load Python/3.10.8-GCCcore-12.2.0

5. Create an interactive Python script (interactive_permutations.py)

[bbanner@cnode016 ~]$ vi interactive_permutations.py

6. Run the Python script:

[bbanner@cnode016 ~]$ python interactive_permutations.py

Example output:

Permutations of 2 numbers: [(1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4), (3, 1), (3, 2), (3, 4), (4, 1), (4, 2), (4, 3)]
Sums of each permutation: [3, 4, 5, 3, 5, 6, 4, 5, 7, 5, 6, 7]
Total number of permutations: 12

7. Handling Job Expiry:

If a job exceeds its time limit, SLURM will revoke the allocation.

[bbanner@cnode016 ~]$ salloc: Job 121322 has exceeded its time limit and its allocation has been revoked.

8. To log out of the interactive session:

[bbanner@cnode016 ~]$ exit

Note

Remember to back up all files externally. /lscratch is cleared after every job or node reboot.


For further assistance, visit our Support Page or contact our team at auhpcs_support@augusta.edu
Support is only available during normal business hours and cannot guarantee your data.