Loki User Guide

This guide provides essential information for accessing and using the Loki cluster, including job submission, queue monitoring, resource allocation, and module management.

Batch Submission

To submit a batch job, follow these steps:

1. Login to the cluster:

> ssh lskywalker@hpc-sub.augusta.edu

Example output:

[lskywalker@Loki_batch_submission ~]$

2. Check system status:

[lskywalker@Loki_batch_submission ~]$ sinfo
[lskywalker@Loki_batch_submission ~]$ squeue

Example output:

JOBID    PARTITION   NAME       USER        ST     TIME      NODES  NODELIST(REASON)
121277  cpu_middl   cosmx-RN   hsolo        R   17:11:00       1       cnode019
121294  cpu_norma   1DLN_GEX   lorgana      R    4:21:30       1       cnode001

To check your jobs specifically:

[lskywalker@Loki_batch_submission ~]$ squeue -u $USER
JOBID  PARTITION     NAME    USER      ST       TIME  NODES NODELIST(REASON)
121322 cpu_norma     bash  lskywalker   R       0:58      1     cnode016

To view job details:

[lskywalker@Loki_batch_submission ~]$ scontrol show job 121322

To check job resource consumption:

While job is running:

[lskywalker@Loki_batch_submission ~]$ sstat -j 121322.batch --format=JobID,MaxRSS,AveCPU,Elapsed

After the job is done:

[lskywalker@Loki_batch_submission ~]$ sacct -j 121322 --format=JobID,JobName,MaxRSS,Elapsed,State

3. Create a Python script (batch_scitkit.py)

 1from sklearn.datasets import make_classification
 2from sklearn.model_selection import train_test_split
 3from sklearn.linear_model import LogisticRegression
 4from sklearn.metrics import accuracy_score
 5
 6# Generate a small dataset
 7X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
 8
 9# Split into training and testing sets
10X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
11
12# Train a simple logistic regression model
13model = LogisticRegression()
14model.fit(X_train, y_train)
15
16# Make predictions
17y_pred = model.predict(X_test)
18
19# Print accuracy
20print(f"Interactive Model Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")

4. Create a batch submission script (submit_quick_job.sh)

#!/bin/bash
#SBATCH --job-name=scitkit_job          # Job name
#SBATCH --output=scitkit_output_%j.txt  # Output log file
#SBATCH --ntasks=1                      # Number of tasks
#SBATCH --cpus-per-task=1               # CPU cores per task
#SBATCH --mem=1G                        # Minimal memory
#SBATCH --time=00:01:00                 # 1-minute runtime
#SBATCH --partition=cpu_normal_q        # SLURM partition

# Load Python
module load Python/3.10.8-GCCcore-12.2.0
module load scikit-learn/1.1.2-foss-2022a

# Run the Python script
python batch_scitkit.py

5. Submit the job:

[lskywalker@Loki_batch_submission ~]$ sbatch submit_quick_job.sh
[lskywalker@Loki_batch_submission ~]$ squeue -u $USER

6. Check files generated:

[lskywalker@Loki_batch_submission ~]$ ll

Example output:

-rw-r--r-- 1 lskywalker auhpcs_jedi_g  683 Mar 13 13:28 batch_scitkit.py
-rw-r--r-- 1 lskywalker auhpcs_jedi_g    0 Mar 13 09:16 scitkit_output_121394.txt
-rw-r--r-- 1 lskywalker auhpcs_jedi_g  835 Mar 12 15:07 submit_quick_job.sh
[lskywalker@Loki_batch_submission ~]$ cat scitkit_output_121394.txt

Example output:

Interactive Model Accuracy: 83.00%

Interactive Job Execution

1. Login to the interactive node:

> ssh hsolo@hpc-inter-sub.augusta.edu

2. Request interactive resources:

[hsolo@Loki_inter_submission ~]$ salloc --nodes=1 --ntasks=1 --cpus-per-task=2 --time=02:00:00 --partition=interactive_q

Example output:

salloc: Granted job allocation 121322

[hsolo@Loki_inter_submission ~]$ squeue -u $USER
JOBID   PARTITION     NAME    USER      ST       TIME  NODES NODELIST(REASON)
121322 interactive     bash    hsolo     R       2:00      1    cnode018

3. Start an interactive shell on the allocated compute node:

[hsolo@Loki_inter_submission ~]$ srun --jobid 121322 --pty /bin/bash -i

Now you’re inside your allocated compute node session.

4. Load necessary modules:

[hsolo@cnode016 ~]$ module list
[hsolo@cnode016 ~]$ module avail scikit-learn
[hsolo@cnode016 ~]$ module load scikit-learn/1.2.1-gfbf-2022b

5. Create an interactive Python script (interactive_scikit.py)

[hsolo@cnode016 ~]$ vi interactive_scikit.py

6. Run the Python script:

[hsolo@cnode016 ~]$ python interactive_scikit.py

Example output:

Interactive Model Accuracy: 83.00%

7. Handling Job Expiry:

If a job exceeds its time limit, SLURM will revoke the allocation.

[hsolo@cnode016 ~]$ salloc: Job 121322 has exceeded its time limit and its allocation has been revoked.

8. To log out of the interactive session:

[hsolo@cnode016 ~]$ exit

Note

For graphical MATLAB sessions, see Run MATLAB from the Loki cluster over an SSH tunnel.


For further assistance, visit our Support Page or contact our team at auhpcs_support@augusta.edu