We support several different classes of interactive use, to allow code development, debugging, job monitoring or post-processing.
Most simply, one can run programs straightforwardly on the command line of a login node. This is frequently sufficient for the purposes of compilation and job preparation. Note that provided you have an X server on your local machine, and you enable X-forwarding in your SSH connection (e.g. the -X or -Y options to ssh, see Connecting to CSD3 systems), then X-windows applications launched on a login node should display on your screen (but NB, never try to use xhost to make this work).
The login nodes also support VNC and X2Go remote desktops. In particular, post-processing of large data sets may involve interactive visualisation software using 3D graphics, which can be performed using VNC on the graphical
login-gfx login nodes. Use of VNC with and without 3D is described in Connecting to CSD3 via TurboVNC.
The login nodes are similar in terms of hardware to the cluster compute nodes. It is possible to run small MPI jobs on the login nodes for testing purposes using shared memory. However, the login nodes are finite, shared resources and any such use must respect other users. In particular, parallel test runs must be short (i.e. seconds), use no more than four CPUs and 20GB of memory each, and should be niced (prefixed with nice -19) so as not to impact interactive responsiveness. The login nodes are not to be used to run production workload outside the batch queueing system; antisocial use of a login node will quickly annoy other users and trigger the watchdog enforcement script.
Note that interactive use of compute nodes is available as described below.
The progress of any batch job started through the scheduler can be monitored simply by logging in (using ssh) to any of the compute nodes assigned to the job (use e.g. squeue to see which these are). For example, the UNIX command top will immediately show whether there are active job processes, and what percentage of a CPU core each such process is managing to use; low percentages usually suggest a problem. Also from top one can verify the amount of memory per node that the job actually demands (see also the free command); exhausting the node memory will at minimum cause the node to start writing memory pages to swap (thus causing immediate and drastic performance degradation).
The following limitations apply to compute node access:
- Access is possible only via the login nodes (not directly from external machines, although it is possible to jump through a login node, see for example the -J option to ssh in recent OpenSSH versions).
- SSH access is granted to a user when a node starts running jobs owned by that user. If a user has multiple jobs on the same node, the SSH session will be associated with the most recently started job.
- Access will be revoked when the user’s job finishes. When this occurs, all of the user’s processes on the compute node are killed.
Interactive jobs via the scheduler¶
Although job monitoring allows direct access to compute nodes allocated by the scheduler, and one could in principle submit a job which simply sleeps when started, allowing processes to be launched manually from the command line (e.g. for debugging purposes) after connecting to a node via ssh, there is at least one more convenient method of obtaining a set of nodes for interactive use available.
The following command will request two Peta4-Skylake nodes, with one cpu each, interactively for 1 hour, charged to the project MYPROJECT:
sintr -A MYPROJECT -p skylake -N2 -n2 -t 1:0:0 --qos=INTR
Note that a maximum walltime of 1 hour, 96 cpus and only 1 job per user are permitted when using INTR. This command will create a new window if you have an X windows display (and X-forwarding is working to the login nodes), otherwise it will run in the current login node window. It will pause until the job starts, then create a screen session running on the first node allocated (cf man screen). X windows applications started inside this terminal should display properly (if they could from the original login session). Within the screen session, new terminals can be started with control-a c, with navigation between the different terminals being accomplished with control-a n and control-a p. Also srun can be used to start processes on any of the nodes in the job allocation, and SLURM-aware MPI implementations will use this to launch remote processes on the allocated nodes without the need to give them explicit host lists. Alternatively, just SSH in from any screen terminal to any of the allocated nodes.
Interactive jobs on Peta4-KNL and Wilkes2-GPU can be requested in a similar way (replacing
-p skylake with
-p knl or
-p pascal respectively, and substituting the name of the appropriate project).