Connecting to CSD3¶
Tip
For a video and tutorial guide for getting started on CSD3, please visit https://www.hpc.cam.ac.uk/getting-started-csd3-tutorials
CSD3 has several login nodes, for reliability and to share the interactive workload. Also there are specific login nodes associated with each generation of hardware. For reasons of security it is necessary to use SSH in order to connect - any Linux distribution will contain the ssh client which can be used for interactive logins, and also scp, sftp and rsync which employ SSH natively for file transfers. If your Linux/MacOSX/UNIX machine does not have ssh, firstly please check your distribution for it (ensuring you get the most recent version available). Recent versions of Microsoft Windows also include the OpenSSH ssh command-line client as an optional feature (alternatively, PuTTY is often used on Microsoft Windows).
To access the CPU cluster nodes (for use with CPU compute nodes):
ssh <username>@login-cpu.hpc.cam.ac.uk
To use the icelake CPU nodes in particular - also suitable for the production GPU cluster (ampere GPU nodes):
ssh <username>@login-icelake.hpc.cam.ac.uk
The first time you login, you will be asked to check that the host key fingerprints are correct. Please check that the fingerprints reported match those on the page CSD3 Host Keys before responding yes (NB not y).
There are in fact multiple individual login nodes which may be accessed directly. Logins are balanced over login-q-1
to login-q-4
(icelake) and login-p-1
to login-p-4
(cascadelake).
Note that the name login.hpc.cam.ac.uk
is an alias for login-cpu.hpc.cam.ac.uk
, which is itself an alias for login-csd3.hpc.cam.ac.uk
.
Note that you may need to add the -X option to the above ssh command in order to enable transparent forwarding of X applications to your local screen (this assumes that you have an X server running on your local machine). The OpenSSH version of ssh sometimes requires that you use -Y instead of -X (try -Y if applications don’t appear or die with errors). These options to ssh direct the X communication through an encrypted tunnel, and should ‘just work’. In the bad old days before SSH the highly insecure method of persuading X windows to display on remote screens involved commands such as xauth and xhost - you do not and never will need to use either of these with SSH, and if you try to use them you may open up everything you type (including passwords) to be read, and even changed, by evil-doers - so do not try these methods, they are both redundant and dangerous.
Similarly scp, sftp or rsync should also be accepted, incoming to CSD3. Of course doing the same in an outward direction should also work, provided the other system does not block SSH.
Password¶
It is possible to change your initial password using the usual unix command passwd on a login node. University of Cambridge users should note that this will make it different to your UIS Password - see the UIS Password Management Application (https://password.raven.cam.ac.uk/) for changing the latter. Note that the security of both users’ data and the service itself depends strongly on choosing the password sensibly, which in the age of automated cracking programs unfortunately means the following:
- Use at least 15 characters
- Use a mixture of upper and lower case letters, numbers and at least one non-alphanumeric character
- Do not use dictionary words, common proper nouns or simple rearrangements of these
- Do not use family names, user identifiers, car registrations, media references, …
- Do not re-use a password in use on another system (this is for damage limitation in case of a compromise somewhere).
Passwords should be treated like credit card numbers (and not left around, emailed or shared etc). The above rules are similar to those which apply to systems elsewhere.
Remote desktops¶
It is also possible to connect to a remote VNC or X2GO desktop session on a login node. Please see the page on Remote desktops & 3D visualization for details.
File transfers¶
Any method of file transfer that operates over SSH (e.g. scp, sftp, rsync) should work to or from CSD3, provided SSH access works in the same direction. Thus systems from which it is possible to login should likewise have no difficulty using scp/sftp/rsync, and from CSD3 out to remote machines such connections should also work provided the other system does not block SSH (unfortunately, some sites do, and even more unfortunately, some even block SSH coming out). In whichever direction the initial connection is made, files can then be transferred in either direction. Note that obsolete and insecure methods such as ftp and rcp will not work (nor should you wish to use such things).
Any UNIX-like system (such as a Linux or MacOSX machine) should already have scp, sftp or rsync (or be able to install them from native media). Similarly these tools can be installed on Windows systems as part of the Cygwin environment. An alternative providing drag-and-drop operation under Windows is WinSCP, and in the same vein MacOSX or Windows users might consider cyberduck.
Of the command-line tools mentioned here, rsync is possibly the fastest, the most sophisticated and also the most dangerous. The man page is extensive but for example the following command will copy a directory called results in your home directory on CSD3 to the directory from_csd3/results on the local side (where rsync is being run on your local machine and your username is assumed to be abc123):
rsync -av abc123@login-cpu.hpc.cam.ac.uk:results from_csd3
Note that a final / on the source directory is significant for rsync - it would indicate that only the contents of the directory would be transferred (so specifying results/ in the above example would result in the contents being copied straight to from_csd3 instead of to from_csd3/results). A pleasant feature of rsync is that repeating the same command will lead to only files which appear to have been updated (based on the size and modification timestamp) being transferred. Rsync also validates each actual transfer by comparing checksums.
On directories containing many files rsync can be slow (as it has to examine each file individually). A less sophisticated but faster way to transfer such things may be to pipe tar through ssh, although the final copy should probably be verified by explicitly computing and comparing checksums, or perhaps by using rsync -avc between the original and the copy (which will do the equivalent thing and automatically re-transfer any files which fail the comparison). For example, here is the same copy of /home/abc123/results on csd3 copied to from_csd3/results on the local machine using this method:
cd from_csd3
ssh -e none -o cipher=arcfour abc123@login-cpu.hpc.cam.ac.uk 'cd /home/abc123 ; tar -cf - results' | tar -xvBf -
In the above the cd command is not actually necessary, but serves to illustrate how to navigate to a transfer directory in a different location to the /home directory.
More on login nodes¶
A login node is in terms of hardware equivalent to a cluster node, but is connected to the external network, and does not run jobs itself. It is intended for:
- compiling code
- developing applications
- submitting applications to the cluster for execution
- monitoring running applications
- post-processing and managing data.
On CSD3 there are multiple login nodes, representing each node type present in the cluster, to improve reliability, share the interactive workload and provide suitable development platforms for each architecture.
The name login-icelake.hpc.cam.ac.uk
points automatically to one of the login nodes for use with icelake CPU or ampere GPU nodes - the IP address to which it resolves will belong to one of:
login-q-1.hpc.cam.ac.uk
, …, login-q-4.hpc.cam.ac.uk
.
The name login-cascadelake.hpc.cam.ac.uk
points automatically to one of the login nodes for use with cclake CPU nodes - the IP address to which it resolves will belong to one of:
login-p-1.hpc.cam.ac.uk
, …, login-p-4.hpc.cam.ac.uk
.
The name login-cpu.hpc.cam.ac.uk
or simply login.hpc.cam.ac.uk
balances between all the above login nodes.
Rather than specify one of the above nodes explicitly by name, it’s better usually to use one of the login-cpu
or login-icelake
aliases as this is more likely to land on a relatively low usage login node. Finally, login
is simply an alias for login-cpu
.
If you have an X2Go or VNC session running on one of these nodes, you will of course want to connect to that node specifically, using its individual name, in order to connect to the correct session.
As noted above, the login nodes are not intended for production workload. Any large or long-lasting job is liable to be terminated by an automatic script called watchdog.