CSD3/Dawn Full Maintenance 30th April - 2nd May 2025

Important

  • This full maintenance is one of several that will be required to allow University Estates to progress the Data Centre upgrade work.

Current Status

  • 07:00 30th April: Maintenance has commenced and login nodes have rebooted.
  • 12:50 Slurm upgrade work is complete. Job scheduling is currently suspended on all but the desktop partitions but new jobs can be submitted.
  • Work on the pipework supplying water to the cooling system in the Data Hall has been underway since 09:00 on 30th April. When this is completed the performance of the cooling will be verified with synthetic load on the cluster.
  • 09:00 1st May: Work on the pipework and cooling units will continue today, followed by load testing.
  • 17:30 1st May: The work has continued today but more slowly than planned and so we will not be returning to production service this evening. Work will continue tomorrow morning with the aim of restoring service during the afternoon, in time for the long weekend.
  • 16:46 2nd May: Job scheduling has been resumed.
  • As of 2nd May 2025 the Research Computing Data Hall is operating at a reduced power capacity of 930KW.

Key Points

  • Login nodes, login-web and RDS/RCS gateways will remain available during the full maintenance, however login-p, login-q and login-s login nodes will reboot at 07:00 in order to implement resource sharing improvements.
  • All currently running jobs on CSD3 and Dawn will finish by 07:00 Wednesday 30th April and no new jobs will start until maintenance is completed.
  • Slurm services may be unavailable for short periods as updates are applied.

Questions

If you have any questions about these developments or have issues before or after the changes, please contact us at support@hpc.cam.ac.uk.