cuda11.4
- Version:
11.4.2
- Category:
tools
- Cluster:
Loki
Description
The NVIDIA CUDA® Toolkit 11.4 delivers tools, libraries, and development utilities for GPU-accelerated computing. It continues support for Ampere, Turing, Volta, and Pascal architectures with optimizations across the toolchain.
Available modules:
cuda11.4/toolkit/11.4.2 Core development toolkit including nvcc, device/runtime drivers, Nsight tools, and headers
cuda11.4/blas/11.4.2 cuBLAS 11.5 — highly optimized GPU-accelerated linear algebra routines
cuda11.4/fft/11.4.2 cuFFT — Fast Fourier Transform library for GPU workloads
Highlights in CUDA 11.4:
Enhanced support for A100 GPUs and multi-instance GPU (MIG) features
Updates to cuBLAS 11.5, cuSPARSE, cuFFT, and other libraries
Improved debugging/profiling via Nsight Systems and Nsight Compute
Expanded CUDA Graph features
Faster device memory initialization via async cudaMemcpy
Documentation
Examples/Usage
Load the CUDA 11.4 toolkit module:
$ module load cuda11.4/toolkit/11.4.2
Compile a CUDA application targeting Ampere:
$ nvcc -arch=sm_80 app.cu -o app
Run the compiled program:
$ ./app
Load cuBLAS or cuFFT separately if needed:
$ module load cuda11.4/blas/11.4.2
$ module load cuda11.4/fft/11.4.2
Inspect your environment:
$ echo $CUDA_HOME
$ echo $PATH
$ echo $LD_LIBRARY_PATH
Unload modules:
$ module unload cuda11.4/toolkit/11.4.2
$ module unload cuda11.4/blas/11.4.2
$ module unload cuda11.4/fft/11.4.2
Installation
The CUDA 11.4 Toolkit was installed from: https://developer.nvidia.com/cuda-11.4-download-archive