Cluster Overview
Job Submission Nodes
Job submission nodes allow users to authenticate to the cluster and are sometimes referred to as login nodes. They also provide applications required for scripting, submitting and managing batch compute jobs. Batch compute jobs are submitted to the cluster work queue. The user then waits for the job to be scheduled and run when the requested compute resources are available.
Important
There are two job submission nodes in an active/active configuration, meaning they balance submission node access to the cluster.
Do not run applications or compute jobs on job submission nodes. These are special-purpose virtual systems with minimal resources. Running compute work on these nodes will degrade usability for all users.
AUHPCS job submission nodes can be accessed using supported OpenSSH clients via DNS: hpc-sub.augusta.edu
Data Transfer Nodes
Data transfer nodes provide access to user file systems in the cluster. Their role is to facilitate high speed transfer of data across those file systems within the cluster. These nodes can also be used for the transfer of data in and out of the cluster.
Additional Notes:
Data transfer nodes provide access to all user file systems.
Transfers to and from endpoints external to the cluster may be slower than internal cluster transfers due to networking throughput limitations.
Best practice: Users should move inactive data from
/scratchto/workor/projectto free up space.
AUHPCS data transfer nodes can be accessed using supported OpenSSH clients via DNS: hpc-xfer.augusta.edu
Compute Node Profiles
There are six different compute node profiles in the cluster. Each profile is best suited for specific workloads, but innovation in resource utilization is encouraged.
General Intel Compute Nodes
Model: Dell PowerEdge R440 ServerProcessors: (2) Intel Xeon Silver 4210R (20 Cascade Lake cores) 2.4G, 13.75M CacheMemory: 96GB DDR4-2400Local scratch space: 960GB SSDNumber of nodes: 18These nodes are candidates for bioinformatics, genomics, population science, mathematics, chemistry, and physics workloads with the most modest resource needs.
Middle Memory Intel Compute Nodes:
Model: Dell PowerEdge R640 ServerProcessors: (2) Intel Xeon Gold 5218R (40 Cascade Lake cores) 2.1G, 27.5M CacheMemory: 768GB DDR4-2666Local scratch space: 960GB SSDNumber of nodes: 8These nodes are candidates for bioinformatics, genomics, population science, mathematics, chemistry, physics, and some modeling workloads with the additional resource needs. It is also likely these nodes would be suitable for pharmaceutical, molecular biology, and simulation workloads.
High memory Intel compute nodes:
Model: Dell PowerEdge R640 ServerProcessors: (2) Intel Xeon Gold 5220R (48 Cascade Lake cores) 2.2G, 35.75M CacheMemory: 1.53TB DDR4-2666Local scratch space: 1.92TB SSDNumber of nodes: 2These nodes are candidates for bioinformatics, genomics, population science, mathematics, chemistry, physics, modeling, pharmaceutical, molecular biology, and simulation workloads with the largest resource needs
NVIDIA Quadro RTX - Intel compute nodes:
Model: Dell PowerEdge R740XD ServerProcessors: (2) Intel Xeon Gold 6246R (32 Cascade Lake cores) 3.4G, 35.75M CacheProcessors: (3) NVIDIA Quadro RTX 6000 (Turing, CUDA/Tensor 27,648/3,456 cores)Memory: 768GB DDR4-2933 (CPU), 72GB GDDR6 (GPU)Local scratch space: 1.92TB SSDNumber of nodes: 2These nodes are candidates for data sciences, physics and life science modeling, artificial intelligence, inference, and simulation workloads with modest resource needs. These systems also include hardware features that can be used to accelerate complex simulations of the physical world such as particle or fluid dynamics for scientific and data visualization. They could also be used for film, video, and graphic rendering, or even special effects workloads.
NVIDIA Tesla T4 - Intel compute nodes:
Model: Dell PowerEdge R740XD ServerProcessors: (2) Intel Xeon Gold 6246R (32 Cascade Lake cores) 3.4G, 35.75M CacheProcessors: (2) NVIDIA Tesla T4 (Turing, CUDA/Tensor 10,240/1,280 cores)Memory: 768GB DDR4-2933 (CPU), 32GB GDDR6 (GPU)Local scratch space: 1.92TB SSDNumber of nodes: 2These nodes are candidates for mathematics, data sciences, artificial intelligence, inference, machine learning, deep learning, and simulation workloads with modest resource needs.
NVIDIA A100 – AMD compute node:
Model: NVIDIA DGX A100 P3687Processors: (2) AMD EPYC 7742 (128 Rome cores) 2.25G, 256M CacheProcessors: (8) NVIDIA A100 Tensor Core (Ampere, CUDA/Tensor 55,296/3,456 cores)Memory: 1TB DDR4-3200 (CPU), 320 GB, HBM2e (GPU)Local scratch space: 15TB NVMeNumber of nodes: 1This node provides the greatest end-to-end HPC platform performance in the cluster. It offers many enhancements that deliver significant speedups for largescale artificial intelligence, inference, deep learning, data analytics, and digital forensic workloads.
Note
The AUHPCS cluster uses the SLURM workload manager for job scheduling. For details on job submission and resource allocation, refer to the SLURM Workload Manager section.