PyTorch

The PyTorch website describes the library as:

Tensors and Dynamic neural networks in Python with strong GPU acceleration.

Setup PyTorch on CSD3

The python installation will depend on the target partition (CPU or GPU). To run Pytorch on a CPU partition (cclake, icelake), please do the following:

#!/bin/bash

# load the relevant cluster environment and a recent python built for the cluster:
module purge
module load rhel8/default-ccl # recommended for all RHEL8 CPU partitions (cclake, icelake, sapphire)
module load python/3.11.9/gcc/nptrdpll 

# create and activate a virtual environment:

python -m venv ./pytorch-cclake-env
source ./pytorch-cclake-env/bin/activate

# install pytorch into your virtual env (instructions from Intel: https://pytorch-extension.intel.com/installation?platform=cpu&version=v2.7.0%2Bcpu&os=linux%2Fwsl2&package=pip)
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python -m pip install intel-extension-for-pytorch
python -m pip install oneccl_bind_pt --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/

To run Pytorch on a GPU partition (ampere), please follow the following instructions:

#!/bin/bash

# load the relevant cluster environment and a recent python built for the cluster:
module purge
module load rhel8/default-amp
module load python/3.8.11/gcc-9.4.0-yb6rzr6

# create and activate a virtual environment:
python -m venv ./pytorch-ampere-env
source ./pytorch-ampere-env/bin/activate

# install pytorch into your virtual env:
pip install torch torchvision torchaudio

Running a PyTorch program

PyTorch will be able to run on both CPU and GPU.

CPU submission script:

#!/bin/bash
#SBATCH --account MYACCOUNT-CPU
#SBATCH --partition cclake
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 8
#SBATCH --time 00:30:00
#SBATCH -o pytorch.out
#SBATCH -e pytorch.err

# Module setup: cluster environment and recent python.
module purge
module load rhel8/default-ccl

# Activate relevant python virtual environment.
source pytorch-cclake-env/bin/activate

# Let PyTorch use all available CPU cores for OpenMP, MKL and MKL-DNN backends
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Edit and replace 'your_script.py' with your own script
python your_script.py

GPU submission script:

#!/bin/bash
#SBATCH --account MYACCOUNT-GPU
#SBATCH --partition ampere
#SBATCH --nodes 1
#! change to gpu:4 to use all 4 GPU cards on a GPU node.
#SBATCH --gres=gpu:1
#SBATCH --time 00:30:00
#SBATCH -o pytorch.out
#SBATCH -e pytorch.err

# Module setup: cluster environment and recent python.
module purge
module load rhel8/default-amp

# Activate relevant python virtual environment.
source pytorch-ampere-env/bin/activate

# Edit and replace 'your_script.py' with your own script
python your_script.py