CSD3 Full Maintenance 27th March 2025

Important

  • This full maintenance is required to allow University Estates to perform the power sequencing tests deferred from 30th January.
  • Maintenance was replaced by incident recovery following a power cut to the data hall.
  • CSD3 service was restored 17:30 Friday 28th March but RCS filesystems are not yet available.

Current Status

  • Update March 28th 17:30
    • Power sequencing work was abandoned when the data centre experienced a brief but total loss of power at approximately 10:55.
    • Since then we have been working in incident recovery mode:
      • CSD3, RDS, RFS and cloud sevices are restored.
      • RCS and Dawn are not currently available.

Key Points

  • Login nodes, login-web and RDS/RCS gateways will reboot and cease to be available at 08:00 on Thursday 27th.
  • All currently running jobs on CSD3 and Dawn will finish by 08:00 and no new jobs will start.
  • While the power sequencing work takes place we will take the opportunity to apply upgrades to lustre, login nodes, and login-web.
  • Plans have now been heavily revised following the data centre-wide incident - the power sequencing work will need to be rescheduled.

Questions

If you have any questions about these developments or have issues before or after the changes, please contact us at support@hpc.cam.ac.uk.