West Cambridge Data Centre Upgrade and Planned Disruption June 2025 - February 2026

Important

  • This page describes the timeline and milestones of the West Cambridge Data Centre upgrade project, and the expected impacts on services run from the Research Computing Data Hall (CSD3, Dawn, SRCP, RFS, RDS, RCS, Arcus/IRIS/SKA/Gaia).
  • Please notice the multi-day full maintenance planned July 21st-25th 2025.

Last updated: Thu Jul 24 17:19:25 BST 2025

Overview

The project to upgrade the power and cooling systems in the West Cambridge Data Centre (WCDC) is now in its implmentation phase and is expected to require several disruptions to services run from it, between May 2025 and its completion in late 2025 or early 2026. The purpose of the project is to provide a sustainable increase to electrical and cooling capacity and so allow the expansion of services. There have already been some unexpected service disruptions in the course of the execution of the project, and more are possible as it progresses. Current information about the expected impacts to service and the status of the upgrade will be made available on this page.

Current Status

CSD3
  • Unavailable
Dawn
  • Unavailable
RDS/RCS
  • Unavailable
RFS
  • Available
IRIS/Gaia Hypervisors
  • Unavailable
Windows SRCP
  • Unavailable
Linux SRCP
  • Available
Arcus/Other IRIS hypervisors
  • Available
  • [24/07/2025] (17:00) The new temporary cooling system has been installed on schedule and is under test. The remaining network issues have been mitigated. RFS is now available and we still expect restoration of all services during Friday.
  • [23/07/2025] (15:30) Cooling work is approximately 50% complete and on schedule. There are still network issues under investigation and not all service endpoints are currently visible from outside the data centre. Other parts of the maintenance work are proceeding on schedule. At the moment most services are expected to return to normal operation on Friday.
  • [22/07/2025] (14:00) Cooling work is proceeding on schedule. The upgrade of the core network super-spine encountered technical issues leading to a loss of network contact with the outside world (also affecting access to this web page) 14:00 Monday - 14:00 Tuesday; part of this has now been rolled back pending further investigation.
  • [21/07/2025] (07:00) Full maintenance 21st July - 25th July has commenced. Services will shut down as described below.
  • [23/05/2025] The research computing data hall (DH1) has been limited to a maximum power consumption of 930KW since 2nd May. This has led to a reduction of compute capacity on the CSD3 and Dawn systems, but none elsewhere.
  • [17/06/2025] The onset of warm weather has further reduced the capacity of the temporary cooling system and we are currently limited to 900KW. We are monitoring the situation.
  • [11/07/2025] Heatwave conditions have further reduced capacity to 850KW.

Key Dates

July 21st-25th 2025

Previously scheduled for July 8th-10th.

All services unavailable to allow upgrade of the temporary cooling plant and disruptive changes to pipework, network and storage.

  • The outcome of this maintenance will be additional cooling capacity and upgraded internal network, plus progress towards replacement of old pipework.
  • Services (CSD3, Dawn, SRCP, Arcus/IRIS/SKA/Gaia, RDS, RCS, RFS) will be unavailable from the morning of 21st July, and will be restored as soon as possible.
  • For CSD3, Dawn and IRIS/Gaia service will return after the cooling is returned to operation, which is expected on Friday.
  • Other services may be restored earlier, but all services should be considered at-risk and subject to access interruptions until the end of the maintenance period.
  • After completion DH1 capacity increases to 1.3MW.

July 29th 2025

This could not be scheduled for 21st-25th June.

Urgent transformer repair work will reduce capacity on Tuesday 29th July.

  • This work is necessary to repair a faulty transformer and will require us to reduce load.
  • Please note there is a risk this will trip power in some rows containing IRIS hypervisors.

July/August 2025

Pipework replacement impacting liquid cooled systems (icelake, sapphire rapids, dawn), row by row. This will allow connection of these systems to the new cooling system, and gradually free them from the current power constraints. Note that these liquid cooled systems include SRCP, Arcus/IRIS/SKA/Gaia VMs and UKAEA Sapphire Rapids HBM. We expect the impact to be as follows:

  • a further outage of up to one day affecting VMs and Icelake login nodes, and two similar outages affecting Sapphire Rapids HBM nodes.
  • other services will manage around this by changing which nodes are available.

September 2025

Disruption to rear door cooling, row by row.
  • We expect to be able to manage around this with minimal service impact.
  • This work will increase the resilience of the cooling, removing some single points of failure as well as increasing the cooling capacity ready for when the power capacity is increased.
Rescheduling of delayed power sequencing work, to prepare for power upgrade to 1.8MW in the new year.
  • This has been rescheduled to September 18th.
  • This will affect the entire data centre and another full shutdown will be required September 17th-19th.

January/February 2026

Repeat of power switching exercise to enable installation of the new distribution board​.
  • We should still have cooling​ during this work, but the success or otherwise of the sequencing work is likely to determine the risk appetite for keeping services online.
Migration of DH1 Rows C-F [1] to new power infrastructure including new UPS, Generators and Transformer.
  • This will disrupt high power systems which are not resilient​, which is likely to be manageable by changing which nodes are available.
  • DH1 capacity increases to 1.8MW.

February 2026

Full commissioning.
  • Disruption and risk to be determined.

Questions

If you have any questions about these developments or have issues before, during or after these periods, please contact us at support@hpc.cam.ac.uk.

Change Log

  • [24/07/2025] Maintenance update.
  • [23/07/2025] Maintenance update.
  • [22/07/2025] Maintenance update post network blackout.
  • [21/07/2025] Maintenance start. Per service status update.
  • [18/07/2025] (17:04) IRIS/Gaia shutdown on July 20th clarified.
  • [18/07/2025] Information added about July 21st-25th maintenance.
  • [17/07/2025] Transformer repair work confirmed for July 29th.
  • [11/07/2025] July 21st-25th rescheduled cooling and network maintenance confirmed. September 18th date for rescheduled power sequencing confirmed.
  • [04/07/2025] Mark July 8-10 as cancelled.
  • [27/06/2025] Update re July 8-10 and subsequent timeline.
  • [24/06/2025] Update post June 24th events.
  • [23/06/2025] Updated dates and details for work on July 8-10th.
  • [17/06/2025] Warm weather update, transformer repair for 24th June added and July full maintenance update.
  • [10/06/2025] Version string and change log added.
  • [23/05/2025] Page created.
[1]This refers to the racks in rows C-F in data hall 1. These contain elements of CSD3, SRCP, Arcus and storage, so parts of these services may be affected during these phases of the work.