Quick Start¶

Important

If you are a brand new internal user to the HPC service and have not submitted a HPC Account Application form, please visit https://www.hpc.cam.ac.uk/rcs-application. Do not attempt to login without submitting the form.
If you are a first time user OR have not used the system since 1st November 2022, you will need to set up your MFA. Follow the MFA guidance documentation

First Time Accessing CSD3¶

All users of the HPC service will be required to have MFA set up in order to access and use the services.

It is strongly recommended that you read through the MultiFactor Authentication (MFA) user documentation to assist with the setup of your MFA.

For an introduction to CSD3, how it works, who can access and how, in video format, please visit https://www.hpc.cam.ac.uk/getting-started-csd3-tutorials

Login¶

All users should contact CSD3 initially through the login nodes.

To access the CPU cluster nodes (for use with CPU compute nodes):

ssh <username>@login-cpu.hpc.cam.ac.uk

Research Computing Services use CRSids where they exist and so the <username>@ can usually be omitted if your University of Cambridge system does also. Internal Cambridge users should use their UIS password (at least initially). SSH keys can also be used (but always set a passphrase).

The first time you login, you will be asked to check that the host key fingerprints are correct. Please check that the fingerprints reported match those on the page CSD3 Host Keys before responding yes (NB not y).

There are in fact multiple individual login nodes which may be accessed directly. Logins are balanced over login-q-1 to login-q-4 (icelake) and login-p-1 to login-p-4 (cascadelake).

Note that the name login.hpc.cam.ac.uk is an alias for login-cpu.hpc.cam.ac.uk, which is itself an alias for login-csd3.hpc.cam.ac.uk.

All CSD3 nodes run a rebuild of Red Hat Enterprise Linux 8 (RHEL8) and have the same access to filesystems. The nodes vary however in the generation of their CPUs. Therefore it is recommended to develop code on the closest matching type of login node, according to the choices suggested above, and certainly to recompile applications that were previously run on a different version of RHEL.

Tip

For a video and tutorial guide for getting started on CSD3, please visit https://www.hpc.cam.ac.uk/getting-started-csd3-tutorials

Password¶

It is possible to change your initial password using the usual unix command passwd on a login node. University of Cambridge users should note that this will make it different to your UIS Password - see the UIS Password Management Application <https://password.raven.cam.ac.uk/> for changing the latter. Note that the security of both users’ data and the service itself depends strongly on choosing the password sensibly, which in the age of automated cracking programs unfortunately means the following:

Use at least 15 characters
Use a mixture of upper and lower case letters, numbers and at least one non-alphanumeric character
Do not use dictionary words, common proper nouns or simple rearrangements of these
Do not use family names, user identifiers, car registrations, media references, …
Do not re-use a password in use on another system (this is for damage limitation in case of a compromise somewhere).

Tip

If the UIS Password does not appear to work, please visit the UIS Password Management page and change your password. This will propagate your UIS Password to other systems such as BLUE AD which we use (it is permissible to change the password to itself provided the strength indicator shows GREEN).

Passwords should be treated like credit card numbers (and not left around, emailed or shared etc). The above rules are similar to those which apply to systems elsewhere.

Filesystems¶

Please see here for a summary of available filesystems and the rules governing them.

Modules¶

We use the modules environment extensively. A module can for instance be associated with a particular version of Intel compiler, or different MPI libraries etc. Loading a module establishes the environment required to find the related include and library files at compile-time and run-time.

By default the environment is such that the most commonly required modules are already loaded. It is possible to see what modules are loaded by using the command module list

[abc123@login-q-1 ~]$ module list

Currently Loaded Modulefiles:
 1) dot                   5) cuda/11.4                 9) intel/libs/idb/2020.2   13) intel/bundles/complib/2020.2
 2) rhel8/slurm           6) intel/compilers/2020.2   10) intel/libs/tbb/2020.2   14) rhel8/default-icl
 3) singularity/current   7) intel/mkl/2020.2         11) intel/libs/ipp/2020.2   15) use.own
 4) rhel8/global          8) intel/impi/2020.2/intel  12) intel/libs/daal/2020.2  16) bacula/5.2.13

The above shows that Slurm (the job queueing system software), as well as the Intel compilers and the Intel MPI environment are loaded (these are actually loaded as a result of loading the default- module, which is loaded automatically on login).

Further commands:

module load <module>         -> load module
module unload <module>       -> unload module
module purge                 -> unload all modules
module list                  -> show currently loaded modules
module avail                 -> show available modules
module whatis                -> show available modules with brief explanation

There may be a number of historical modules inherited from older systems that appear in response to module avail. Please avoid modules with names such as “sandybridge” and “nehalem” in their paths and prefer modules exhibiting the Spack hash (-4qrgkot in the example above).

.bashrc¶

The default environment should be correctly established automatically via the modules system and the shell initialization scripts. For example, essential system software for compilation, credit and quota management, job execution and scheduling, error-correcting wrappers and MPI recommended settings are all applied in this way. This works by setting the PATH and LD_LIBRARY_PATH environment variables, amongst others, to particular values. Please be careful when editing your ~/.bashrc file, if you wish to do so, as this can wreck the default settings and create many problems if done incorrectly, potentially rendering the account unusable until administrative intervention. In particular, if you wish to modify PATH or LD_LIBRARY_PATH please be sure to preserve the existing settings, e.g. do

export PATH=/your/custom/path/element:$PATH
export LD_LIBRARY_PATH=/your/custom/librarypath/element:$LD_LIBRARY_PATH

and don’t simply overwrite the existing values, or you will have problems. If you are trying to add directories relating to centrally-installed software, please note that there is probably a module available which can be loaded to adjust the environment correctly.

Users who are returning to CSD3 after a few years should check their ~/.bashrc files and if necessary DELETE any pre-existing lines such as these:

module load default-impi
module load default-wilkes
module load rhel7/default-peta4

since these will now interfere with the proper environment settings on CSD3 nodes.

It is highly recommended not to load modules in ~/.bashrc, as this may cause unexpected problems.

Compiling¶

Note that the default-X module, which is loaded by default on nodes of type X, arranges for mpicc, mpif90 etc to be found and to use the recommended compilers and MPI implementation automatically when invoked. These wrapper commands supply the correct flags for compiling with the particular MPI implementation to which they belong.

When compiling code, it is usually possible to remove any direct MPI library references in your Makefile as mpicc & mpif90 will take care of these details. In the Makefile, simply set

CC=mpicc

etc, or define

export CC=mpicc

etc in the shell before compilation.

If some required libraries are missing, please let us know and we can try to install them centrally (as a module).

Running¶

Please note that the following resource limits apply:

On CPU, SL1 and SL2 users are limited to 4256 cores in use at any one time and a maximum wallclock runtime of 36 hours per job. On GPU, SL1 and SL2 are limited to 64 GPUs in use at any one time and a maximum wallclock runtime of 36 hours per job. SL3 users are similarly limited to 448 cores (CPU) and 32 GPUs (GPU), all with up to 12 hours per job. For more information, please see this full description of service levels (SLs).

Please see the example job submission scripts under /usr/local/Cluster-Docs/SLURM. There are example scripts for launching an MPI application on CPU or GPU via the queueing system:

/usr/local/Cluster-Docs/SLURM/slurm_submit.peta4-cclake

/usr/local/Cluster-Docs/SLURM/slurm_submit.peta4-icelake

/usr/local/Cluster-Docs/SLURM/slurm_submit.wilkes3

Copying the appropriate example file and then modifying the top section (where indicated) will create a script suitable for submission to the batch queueing system via the command sbatch.

There are also examples of submission scripts for specific applications in this documentation - see for example CASTEP or LAMMPS.

Peta4 and Wilkes3 operate the SLURM batch queueing system for managing resources. Some useful commands:

squeue      -> show global cluster information
sinfo       -> show global cluster information
sview       -> show global cluster information
scontrol show job nnnn -> examine the job with jobid nnnn
scontrol show node nodename -> examine the node with name nodename
sbatch      -> submits an executable script to the queueing system
sintr       -> submits an interactive job to the queueing system
srun        -> run a command either as a new job or within an existing job
scancel     -> delete a job
mybalance   -> show current balance of core hour credits

Once your application is compiled, e.g. to a binary called prog, it can be submitted to the queueing system as follows (we assume it is destined for Ice Lake).

Firstly, copy the template SLURM submission script:

cp /usr/local/Cluster-Docs/SLURM/slurm_submit.peta4-icelake slurm_submit

(Note that for convenience newer users may have symbolic links to these template files in their home directories - these are read-only so making a copy is still necessary.)

Edit the copy slurm_submit, setting application to “prog” and workdir to the correct working directory. Set options to contain any desired command line options, e.g “>outfile 2>errfile” would redirect stdout and stderr to files which could be monitored while the job runs. Note the comment lines at the head of the script:

#! Which project should be charged:
#SBATCH -A CHANGEME
#! How many whole nodes should be allocated?
#SBATCH --nodes=1
#! How many (MPI) tasks will there be in total? (<= nodes*76)
#SBATCH --ntasks=76
#! How much wallclock time will be required?
#SBATCH --time=02:00:00
#SBATCH -p icelake

These are comments to bash, but are interpreted by SLURM as requests to use 1 node, with 76 tasks in total (which because each Ice Lake node has 76 cores, completely utilizes one node), for 2 hours of wall time (i.e. actual time as measured by a clock on the wall, rather than CPU time). Finally, the following command submits the job to the queueing system:

sbatch slurm_submit

Please note that each Ice Lake node has 76 physical CPU cores, and one should normally be careful not to start more working processes or threads per node than there are CPUs per node (note that hyperthreading is disabled, so there is no distinction between “CPU core” and “CPU”).

Furthermore the Ice Lake nodes come in two sizes - 3.4GB per CPU (256GB total RAM) and 6.8GB per CPU (512GB total RAM). These sizes are selected by the #SBATCH -p partition directive, i.e.

#SBATCH -p icelake

for 3.4GB per CPU nodes, and

#SBATCH -p icelake-himem

for 6.8GB per CPU nodes. Jobs requesting more than 3.4GB per CPU may still be submitted to the icelake partition, but may be automatically assigned extra CPUs to cover the additional memory, and will therefore be charged more usage credits. Therefore it is recommended that whenever possible jobs requiring more than 3.4GB per CPU be submitted to icelake-himem.

The cclake and cclake-himem partitions work in similar ways but the exact numbers vary according to the node hardware - see the template job submission scripts for details.

The Wilkes3-GPU (ampere) nodes each contain 4 NVIDIA A100 GPUs. It is possible to request 1, 2, 3, or 4 GPUs for a single node job, but multinode jobs will need to request 4 GPUs per node, which would be done by using the directives

#SBATCH -p ampere
#SBATCH --gres=gpu:4

(see /usr/local/Cluster-Docs/SLURM/slurm_submit.wilkes3).

The job’s status in the queue can be monitored with squeue; alternatively use qstat or showq (add -u username to focus on a particular user’s jobs).

The job can be deleted with scancel <job_id> or qdel <job_id>.

When the job finishes (in error or correctly) there will normally be one file created in the submission directory with a name of the form slurm-NNNN.out (where NNNN is the job id).

Problems¶

Please check first of all whether there is an answer to your question in the FAQ. If not, please request support with your details and job submission scripts.