West Cambridge Data Centre Upgrade and Planned Disruption June 2025 - February 2026¶
Important
- This page describes the timeline and milestones of the West Cambridge Data Centre upgrade project, and the expected impacts on services run from the Research Computing Data Hall (CSD3, Dawn, SRCP, RFS, RDS, RCS, Arcus/IRIS/SKA/Gaia).
Last updated: Sat Sep 20 03:14:44 BST 2025
Overview¶
The project to upgrade the power and cooling systems in the West Cambridge Data Centre (WCDC) is now in its implmentation phase and is expected to require several disruptions to services run from it, between May 2025 and its completion in early 2026. The purpose of the project is to provide a sustainable increase to electrical and cooling capacity and so allow the expansion of services. There have already been some unexpected service disruptions in the course of the execution of the project, and more are possible as it progresses. Current information about the expected impacts to service and the status of the upgrade will be made available on this page.
Current Status¶
- WCDC Power/Cooling Capacity
- 1.2MW
- CSD3
- Available
- Dawn
- Available
- RFS/RDS/RCS
- Available
- IRIS/Gaia Hypervisors
- Available
- Windows SRCP
- Available
- Linux SRCP
- Available
- Arcus/Other IRIS hypervisors
- Available
Key Dates¶
September 2025¶
- Wednesday September 17th - Friday September 19th 2025
- Rescheduling of delayed power sequencing work, to prepare for power upgrade to 1.8MW in the new year.
- CSD3, Dawn, RDS, RCS, RFS, SRCP, Arcus and IRIS hypervisors will be unavailable from 09:00 17th September to return on 19th September.
- During this downtime, this web page will not be available - please check this page instead.
- This work completed successfully and services have been restored.
November 2025¶
- Disruption to rear door cooling, row by row.
- We expect to be able to manage around this with minimal service impact.
- This work will increase the resilience of the cooling, removing some single points of failure as well as increasing the cooling capacity ready for when the power capacity is increased.
January 2026¶
- Repeat of power switching exercise to enable installation of the new distribution board.
- We should still have cooling during this work, but the success or otherwise of the sequencing work is likely to determine the risk appetite for keeping services online.
March/April 2026¶
- Migration of DH1 Rows C-F [1] to new power infrastructure including new UPS, Generators and Transformer.
- This will disrupt high power systems which are not resilient, which is likely to be manageable by changing which nodes are available.
- DH1 capacity increases to 1.8MW.
- Full commissioning.
- Disruption and risk to be determined.
Questions¶
If you have any questions about these developments or have issues before, during or after these periods, please contact us at support@hpc.cam.ac.uk.
Change Log¶
- [20/09/2025] (03:15) Maintenance complete.
- [19/09/2025] (18:40) Maintenance (mostly) complete.
- [19/09/2025] (17:00) Maintenance progress.
- [16/09/2025] (13:16) Update notice of 17-19 Sept downtime.
- [03/09/2025] (14:45) Update post disruptive planned work 3rd Sept
- [03/09/2025] (09:30) Tidy up to remove info from pre-September
- [02/09/2025] (17:00) General updates to schedule including one day outage to some systems on 3rd Sept
- [21/08/2025] (17:00) Closure of major incident.
- [20/08/2025] (19:20) Status update - continuing to prove cooling.
- [19/08/2025] (21:00) Status update - artificial load.
- [15/08/2025] (17:15) Status update.
- [14/08/2025] (18:00) Update on chiller repair.
- [13/08/2025] (16:00) Update on partial sevice resumption.
- [13/08/2025] (14:00) Partial resumption of HPC service.
- [13/08/2025] (09:45) Update on chiller 2 high pressure fault.
- [12/08/2025] (15:00) Update on chiller repair.
- [12/08/2025] (11:00) Update on chiller repair.
- [12/08/2025] (10:00) Update on chiller repair.
- [11/08/2025] (15:35) Update on phased load increase on Tuesday.
- [11/08/2025] (09:15) Partial resumption of service (short jobs only).
- [08/08/2025] (15:55) Weekend suspension of service.
- [08/08/2025] Reduced capacity while cooling failure remains under investigation.
- [07/08/2025] (18:50) Chiller failure update - no jobs running overnight.
- [07/08/2025] Chiller failure.
- [05/08/2025] DLC pipework update.
- [29/07/2025] Transformer repair update.
- [25/06/2025] Full maintenance complete.
- [25/06/2025] Maintenance update.
- [24/07/2025] Maintenance update.
- [23/07/2025] Maintenance update.
- [22/07/2025] Maintenance update post network blackout.
- [21/07/2025] Maintenance start. Per service status update.
- [18/07/2025] (17:04) IRIS/Gaia shutdown on July 20th clarified.
- [18/07/2025] Information added about July 21st-25th maintenance.
- [17/07/2025] Transformer repair work confirmed for July 29th.
- [11/07/2025] July 21st-25th rescheduled cooling and network maintenance confirmed. September 18th date for rescheduled power sequencing confirmed.
- [04/07/2025] Mark July 8-10 as cancelled.
- [27/06/2025] Update re July 8-10 and subsequent timeline.
- [24/06/2025] Update post June 24th events.
- [23/06/2025] Updated dates and details for work on July 8-10th.
- [17/06/2025] Warm weather update, transformer repair for 24th June added and July full maintenance update.
- [10/06/2025] Version string and change log added.
- [23/05/2025] Page created.
[1] | This refers to the racks in rows C-F in data hall 1. These contain elements of CSD3, SRCP, Arcus and storage, so parts of these services may be affected during these phases of the work. |