Running Jobs on CSD3¶
SLURM is an open source workload management and job scheduling system. Research Computing clusters adopted SLURM in February 2014, but previously used Torque, Maui/Moab and Gold (referred to in the following simply as “PBS”) for the same purpose. Please note that there are several commands available in SLURM with the same names as in PBS (e.g. showq, qstat, qsub, qdel, qrerun) intended for backwards compatibility, but in general we would recommend using the native SLURM commands, as described below.
If you have any questions on how to run jobs on CSD3 do not hesitate to contact the support desk.
The following commands are wrappers around the underlying SLURM commands sacct and sreport which are much more powerful.
Note that project names in SLURM are not case sensitive.
What resources do I have available to me?
This is the first question to settle before submitting jobs to CSD3. Use the command
to show your projects, your current usages and the remaining balances in compute unit hours.
On CSD3 we are using natural compute units for each component of the facility:
- on Peta4-Skylake we are allocating and reporting in CPU core hours
- on Peta4-KNL we are allocating and reporting in KNL node hours
- on Wilkes2-GPU we are allocating and reporting in GPU hours.
We have adopted the convention that projects containing Peta4-Skylake CPU hours will end in -CPU, while those holding GPU hours for Wilkes2-GPU end in -GPU, and projects containing Peta4-KNL node hours end in -KNL.
The projects listed by mybalance are the projects you may specify in SLURM submissions either through
#SBATCH -A project
in the job submission script or equivalently on the command line with
sbatch -A project ...
Where -CPU projects should be used for Peta4-Skylake jobs, -KNL projects for Peta4-KNL and -GPU projects for Wilkes2. See the Submitting jobs section for details on submitting to each cluster.
How many core hours does some other project or user have?
gbalance -p T2BENCH-SL2-CPU User Usage | Account Usage | Account Limit Available (hours) ---------- --------- + -------------- ------ + ------------- --------- xyz10 0 | T2BENCH-SL2-CPU 0 | 200,000 200,000
This outputs the total usage in core hours accumulated to date for the project, the total awarded and total remaining available (i.e. to all members). It also prints the component of the total usage due to each member.
I would like a listing of all jobs I have submitted through a certain project and between certain times
gstatement -p SUPPORT-CPU -u xyz10 -s "2017-10-01-00:00:00" -e "2017-11-22-23:59:59" JobID User Account JobName Partition End ExitCode State CompHrs ------------ --------- ---------- ---------- ---------- ------------------- -------- ---------- -------- 204815 xyz10 support-c+ _interact+ skylake 2017-10-20T16:20:07 0:0 COMPLETED 0.9 261251 xyz10 support-c+ _interact+ skylake 2017-11-09T17:39:43 0:0 TIMEOUT 1.0 262050 xyz10 support-c+ _interact+ skylake 2017-11-11T14:00:03 0:0 CANCELLED+ 1.5 262051 xyz10 support-c+ _interact+ skylake-h+ 2017-11-11T14:00:03 0:0 CANCELLED+ 0.7 ...
This lists the charge for each job in the CompHrs column. Since this example queries usage of a -CPU project, these are CPU core hours. Similarly, for a -GPU project they would be GPU hours, and for a -KNL project they would be node hours.
I would like to add core hours to a particular member of my group
gdeposit -z 10000 -p halos-sl2-spqr1-gpu
This coordinator of the HALOS-SL2-GPU might use this to add 10000 GPU hours to the HALOS-SL2-SPQR1-GPU subproject assigned to the user spqr1. Note that if a compute hour limit applies to the parent of the project in the project hierarchy - i.e. if the parent project HALOS-SL2-GPU has an overall compute hour limit (which it almost certainly does) - then the global limit will still apply across all per-user projects.
Compute hours may be added to a project by a designated project coordinator user. Reducing the compute hours available to a project is also possible by adding a negative number of hours via the –time= syntax, e.g. the following command undoes the above:
gdeposit --time=-10000 -p halos-sl2-spqr1-gpu
Sample submission scripts¶
In normal use of SLURM, one creates a batch job which is a shell script containing the set of commands to run, plus the resource requirements for the job which are coded as specially formatted shell comments at the top of the script. The batch job script is then submitted to SLURM with the sbatch command. A job script can be resubmitted with different parameters (e.g. different sets of data or variables).
Please copy and edit the sample submission scripts that can be found under
New user accounts also have symbolic links to template files in their home directories. Lines beginning #SBATCH are directives to the batch system. The rest of each directive specifies arguments to the sbatch command. SLURM stops reading directives at the first executable (i.e. non-blank, and doesn’t begin with #) line.
The main directives to modify are as in the following:
#! Which project should be charged: #SBATCH -A MYPROJECT-CPU #! Which partition/cluster am I using? #SBATCH -p skylake #! How many nodes should be allocated? If not specified SLURM #! assumes 1 node. #SBATCH --nodes=2 #! How many tasks will there be in total? By default SLURM #! will assume 1 task per node and 1 CPU per task. #SBATCH --ntasks=64 #! How much memory in MB is required _per node_? Not setting this #! as here will lead to a default amount per task. #! Setting a larger amount per task increases the number of CPUs. ##SBATCH --mem= #! How much wallclock time will be required? #SBATCH --time=02:00:00
in particular, the name of the project is required for the job to be scheduled (use the command mybalance to check what this is for you in case of doubt). Charging is reported in units of compute hours (what these represent depends on the cluster).
See the following sections for more details on the setting of directives for each of the three CSD3 clusters.
Peta4-Skylake assigns usage in units of CPU core hours. By convention projects containing CPU core hours have names ending in -CPU.
Jobs require the partitions skylake or skylake-himem, i.e.
#SBATCH -p skylake
#SBATCH -p skylake-himem
and will be allocated the number of CPUs required for the number of tasks requested and a corresponding amount of memory.
By default, the skylake partition provides 1 CPU and 5990MB of RAM per task, and the skylake-himem partition provides 1 CPU and 12040MB per task.
Requesting more CPUs per task, or more memory per task, may both increase the number of CPUs allocated (and hence the charge). It is more cost efficient to submit jobs requiring more than 5990MB per task to the skylake-himem partition since more memory per CPU is natively available there.
NB Hyperthreading is disabled on the Skylake nodes so there is no distinction between CPUs and cores.
Peta4-KNL assigns usage in units of KNL node hours. By convention projects containing KNL node hours have names ending in -KNL.
Jobs require the partition knl, i.e.
#SBATCH -p knl
and will be allocated entire KNL nodes. Each KNL node has 64 physical cores but presents 256 cpus via hyperthreading, has 96GB DDR RAM plus 16GB MCDRAM high bandwidth memory and has been configured in quadrant/cache mode by default (in cache mode, the MCDRAM works invisibly as cache).
It is possible to vary the MCDRAM mode required at job submission time - please use either –constraint or the equivalent -C sbatch option to select the mode. We recommend using either
Flat mode makes the MCDRAM visible as a second 16GB NUMA zone. Please note that hybrid MCDRAM mode, or any NUMA mode other than quad(rant), are not recommended.
Wilkes2-GPU assigns usage in units of GPU hours. By convention projects containing GPU hours have names ending in -GPU.
Jobs require the partition pascal, i.e.
#SBATCH -p pascal
and may request any number of GPUs per node from the range 1 to 4, which is done via the directive
where 1 <= N <= 4.
Each GPU node contains 4 NVIDIA Pascal P100 GPUs, with 96GB RAM and a single 12-core Broadwell processor.
Any jobs requesting more than one node must request 4 GPUs per node. Jobs less than one node in size will be prevented from requesting more than 3 CPUs per GPU. The enforcement is performed by a job submission filter which will produce an explanatory message if it rejects a job outright.
Submitting the job to the queuing system¶
The command sbatch is used to submit jobs, e.g.
The command will return a unique job identifier, which is used to query and control the job and to identify output. See the man page (man sbatch) for more options.
The following more complex example submits a job array with index values between 1 and 7 with a step size of 2 (i.e. 1, 3, 5 and 7) to the project STARS-SL2-CPU:
sbatch --array=1-7:2 -A STARS-SL2-CPU submission_script
To cancel a job (either running or still queuing) use scancel:
The <jobid> is printed when the job is submitted, alternatively use the commands squeue, qstat or showq to obtain the job ID.