Ice Lake Nodes

These new nodes will enter general service in October 2021.

Comparison with skylake and cclake nodes

Please note that there are some important differences between the Skylake (skylake), Cascade Lake (cclake) and Ice Lake nodes which it is necessary to understand:

  • The Ice Lake nodes are named according to the scheme cpu-q-[1-412].
  • The Ice Lake nodes are in separate icelake and icelake-himem Slurm partitions. Your existing -CPU projects will be able to submit jobs to these.
  • The icelake nodes have 76 cpus (1 cpu = 1 core), and 256 GiB of RAM. This means that they have 3380 MiB per cpu, compared to 5980 MiB per cpu in the skylake partition and 3420 MiB per cpu in the cclake partition.
  • The icelake-himem nodes have 76 cpus (1 cpu = 1 core), and 512 GiB of RAM. This means that they have 6760 MiB per cpu, compared to 12030 MiB per cpu in the skylake-himem partition and 6840 MiB per cpu in the cclake-himem partition.
  • The Ice Lake nodes are interconnected by Mellanox HDR200 Infiniband, rather than Omni-Path (skylake) or Mellanox HDR100 Infiniband (cclake).
  • The Ice Lake nodes are running CentOS8, whereas skylake and cclake run CentOS7.

Recommendations for running on icelake

Since the cpu-q nodes are running CentOS8, you will want to recompile your applications. We suggest you do this by requesting an interactive node using sintr:

sintr -t 4:0:0 -N1 -n38 -A YOURPROJECT-CPU -p icelake

The per-job wallclock time limits are currently unchanged compared to other partitions at 36 hours and 12 hours for SL1/2 and SL3 respectively.

The per-job, per-user cpu limits are now 2240 and 448 cpus for SL1/2 and SL3 respectively.

These limits should be regarded as provisional and may be revised.

Default submission script for icelake

You should find a template submission script modified for the icelake nodes at:

/usr/local/Cluster-Docs/SLURM/slurm_submit.peta4-icelake

This is set up for non-MPI jobs using icelake, but can be modified for other types of job. If you prefer to modify your existing job scripts, please see the following sections for guidance.

Jobs not using MPI and requiring no more than 3380 MiB per cpu

In this case you should be able to simply specify the icelake partition to the -p sbatch directive, e.g.:

#SBATCH -p icelake

will submit a job able to run on the first nodes available in the icelake partition.

Jobs requiring more than 3380 MiB per cpu

In the case of larger memory requirements, it is most efficient to submit instead to the icelake-himem partition which will allocate 6760 MiB per cpu:

#SBATCH -p icelake-himem

If this amount of memory per cpu is insufficient you will need to specify either the –mem= or –cpus-per-task= directive, in addition to the -p directive, in order to make sure you have enough memory at run time. Note that in this case, Slurm will satisfy the memory requirement by allocating (and charging for) more cpus if necessary. E.g.:

#SBATCH -p icelake-himem
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=8000

In the above example we are requesting 8000 MiB of memory per node, but only one task. Slurm will usually allocate one cpu per task, but here, because it enforces 6760 MiB per cpu for icelake-himem, it will allocate 2 cpus to the single task in order that the job will have the 8000 MiB which it claims it requires. Note that this increases the number of cpu core hours consumed by the job and hence the charge. Also note that since each cpu receives 6760 MiB by default anyway, the user would lose nothing by requesting 13000 MiB instead of 8000 MiB here.

Jobs requiring MPI

We recommend using Intel MPI 2020, which is a newer version of Intel MPI than is presented currently in the skylake environment. There are other, related changes to the default environment seen by jobs running on the icelake nodes. If you wish to recompile or test against this new environment, we recommend requesting an interactive node and work on a icelake node directly.

For reference, the default environment on the icelake (cpu-q) nodes is provided by loading a module as follows:

module purge
module load rhel8/default-icl

However since the CPU type on cpu-q is different from that on the login-cpu and login-gpu nodes, and the operating system is a later version than elsewhere, it is not recommended to build software intended to run on cpu-q on a different flavour of node.