################################################################################# West Cambridge Data Centre Upgrade and Planned Disruption May 2025 - January 2026 ################################################################################# .. important:: * This page describes the timeline and milestones of the West Cambridge Data Centre upgrade project, and the expected impacts on services run from the Research Computing Data Hall (CSD3, Dawn, SRCP, RFS, RDS, RCS, Arcus). * Please notice the multi-day full maintenance (no services) planned to commence on July 8th 2025. ******** Overview ******** The project to upgrade the power and cooling systems in the West Cambridge Data Centre (WCDC) is now in its implmentation phase and is expected to require several disruptions to services run from it, between May 2025 and its completion in late 2025 or early 2026. The purpose of the project is to provide a sustainable increase to electrical and cooling capacity and so allow the expansion of services. There have already been some unexpected service disruptions in the course of the execution of the project, and more are possible as it progresses. Current information about the expected impacts to service and the status of the upgrade will be made available on this page. ************** Current Status ************** * [23/05/205] The research computing data hall (DH1) has been limited to a maximum power consumption of 930KW since 2nd May. This has led to a reduction of compute capacity on the CSD3 and Dawn systems, but none elsewhere. ********* Key Dates ********* Wednesday 28th May 2025 ======================== Power will be reduced to 800KW between 09:00-10:00 to allow generator tests, followed by staged power increases up to 1MW. If this goes well and temperatures remain stable this will leave us operating in production with increased compute capacity. June/July 2025 ============== Pipework replacement impacting liquid cooled systems (icelake, sapphire rapids, dawn), row by row​. - Expect a multi-day Arcus outage. We are exploring whether this can be combined with the full outage scheduled for 8-10th July.​ - Other services will manage around this by changing which nodes are available​. July 8-10th (TBC) ================= **All services down to allow power switching work to take place on 9th July.​** - Services will be shut down cleanly on 8th July, and restored on 10th July. - DH1 will have no cooling for 2-3 hours​ which requires us to perform a (near-)total shutdown. - Disruptive core network replacement work will take place in this period. August/Early September ====================== Disruption to rear door cooling, row by row​. - We expect to be able to manage around this with minimal service impact​. - **DH1 capacity returns to 1.3MW.** Late October ============ Repeat of power switching exercise to enable installation of the new distribution board​. - We should still have cooling​ during this work, but the success or otherwise of the work on 9th July is likely to determine the risk appetite for keeping services online. November ======== Migration of individual bus bars to the new UPS (DH1 rows C-F [1]_), row by row​. - This will disrupt high power systems which are not resilient​, which is likely to be manageable by changing which nodes are available. - **DH1 capacity increases to 1.7MW.** November ======== Migration of DH1 rows C-F to to the new generator​. - This will be an at-risk period as there will be no generator protection to rows C-F [1]_ in the event of a mains outage. December 2025/January 2026 ========================== Installation of new transformer​. - Details still to be finalised. Potentially there will be an extended period of running from generator without mains available for some systems. - **DH1 capacity increases to 1.8MW.** January 2026 ============ Full commissioning​. - Disruption and risk to be determined. ​ ********* Questions ********* If you have any questions about these developments or have issues before, during or after these periods, please contact us at support@hpc.cam.ac.uk. .. [1] This refers to the racks in rows C-F in data hall 1. These contain elements of CSD3, SRCP, Arcus and storage, so parts of these services may be affected during these phases of the work.