SLURM Workload Managerο
Overviewο
The AUHPCS cluster uses SLURM (Simple Linux Utility for Resource Management) as its workload manager. SLURM is responsible for managing and scheduling cluster resources and jobs. Cluster resource assignments are referred to as allocations, and job queues are referred to as partitions in SLURM terminology.
The SLURM utilities are available on job submission nodes and the software build node.
Official SLURM documentation: SchedMD SLURM
SLURM Partition Assignmentsο
Each compute node profile is assigned to a SLURM partition, determining how jobs are scheduled. Check Cluster Overview for more information.
SLURM Partition Mapping:
Partition Name |
Compute Node Assignment |
Notes |
Memory Per Node |
Total Memory |
|---|---|---|---|---|
interactive_q |
(1)-general compute node, (1)-GPU node |
Used for interactive workloads |
96 GB / 768 GB |
Variable |
cpu_normal_q |
(18)-general compute nodes |
Default partition for standard CPU jobs |
96 GB / 768 GB |
1.73 TB |
cpu_middle_mem_q |
(8)-middle memory compute nodes |
For memory-intensive CPU workloads |
768 GB |
6 TB |
cpu_high_mem_q |
(2)-high memory compute nodes |
For high memory workloads |
1.53 TB |
3.06 TB |
gpu_normal_q |
(2)-RTX6000 GPU compute nodes |
For GPU workloads requiring moderate power |
768 GB |
1.54 TB |
gpu_middle_ai_q |
(2)-T4 GPU compute nodes |
Suitable for AI, ML, and inference workloads |
768 GB |
1.54 TB |
gpu_high_ai_q |
(1)-DGX A100 GPU compute node |
Optimized for large-scale AI and deep learning |
1 TB |
1 TB |
Partition Reservation Policiesο
Some partitions in AUHPCS require special reservation procedures:
cpu_high_mem_qandgpu_high_ai_qare high memory queues and must be reserved in advance.To reserve these partitions, users must email auhpcs_support@augusta.edu with project details and expected runtime.
The maximum reservation period is 10 days.
Reservations are granted based on resource availability and priority.
Please plan ahead and request access early if your work requires these high-capacity nodes.
Job Submissionο
Job submission is the process of requesting resources from the scheduler. It is the gateway to all the computational horsepower in the cluster. Users submit jobs to tell the scheduler what resources are needed and for how long. The scheduler then evaluates the request according to resource availability and cluster policy to determine when the job will run and which resources to use.
Batch Job Submissionο
Batch jobs are submitted using SLURM job scripts. Slurm directives can be in a job script as header lines (#SBATCH), as command-line options to the sbatch command or a combination of both. If both, the command-line option takes precedence.
The general form of the sbatch command:
$ sbatch [OPTIONS(0)...] [ : [OPTIONS(N)...]] script(0) [args(0)...]
Example:
$ sbatch -N1 -t 1:00:00 my_job.sh
$ cat my_job.sh
#!/bin/bash
#SBATCH --job-name=my_job # job name
#SBATCH --ntasks=10 # number of tasks across all nodes
#SBATCH --partition=cpu_normal_q # name of partition to submit job
#SBATCH --time=01:00:00 # run time (D-HH:MM:SS)
#SBATCH --output=job_output.txt # output file
#SBATCH --error=job_error.txt # error file
#SBATCH --mail-type=ALL # will send email for begin,end,fail
#SBATCH --mail-user=user@augusta.edu
srun ./my_application
This batch job submission requests one node (N1) and a total of four tasks (T4) with a walltime of 1 hr (1:00:00) as specified in the job script using sbatch directives. The job is assigned to the cpu_normal_q partition, and output/error logs are redirected to job_output.txt and job_error.txt, respectively. The application is executed using srun.
Note
Users can find pre-configured SLURM job script templates at: $ ls -l /home/<username>/scripts/templates/
Best Practicesο
Always select the appropriate partition based on job requirements.
Do not over-utilize small resource groups or under-utilize large nodes.
Use interactive jobs for debugging and development, then submit batch jobs for full-scale runs.
Optimize scripts for efficient resource usage and avoid idle node occupation.
For additional SLURM support, contact auhpcs_support@augusta.edu.