Back to AI Data Center Energy Performance Framework
Impact
As a result of the AI arms race, AI data center design and deployment is occurring at record pace. Speed to compute has become one of the key metrics of successful projects and companies.
Traditional approaches to the overall project delivery model are being significantly strained and tested, but specifically the feedback from commissioning to design and operations must accelerate. AI data centers are deploying new technology at a rate never before seen in the industry as technical advances in compute, power, cooling, and all the supporting infrastructure are in a never-ending state of research and development. As such, facilities are commencing construction with partially completed design documents that are constantly being tested by the release of new technical information from IT manufacturers and power and cooling manufacturers. Often this approach to rapid construction limits the impact a Design-Phase Commissioning review can provide.
Because of the rapid rate of deployment, traditional cycles of manufacturer research and development as well as developer proofs of concept are no longer able to be fully vetted before the first scaled-up project commences. This puts the pressure for facility and infrastructure design and continuous improvement on the owner, the design team, and the commissioning agent. The commissioning agent (CxA) is often challenged to engage sufficiently in the design process to provide meaningful insight if engaged in a traditional workflow. Integrating the commissioning process (CxP) in the real-time design and review process can maximize their value.
Likewise, the penalties from end users to the owner (whether internal or external) for delays or failures are increasing because of the commercial pressures for speed to compute. The projects must be built faster, with new technologies, while simultaneously retaining the quality we have come to expect in mission-critical facilities. This requires closer coordination and collaboration between all parties in the project delivery chain, with additional skilled workforce on-site to support the combination of speed and technical complexity. The combination of rapid deployment of new technologies with partially completed designs puts additional uncertainty in project execution schedules and inevitably pressures those activities at the tail end of the project, namely commissioning, to be adjusted to make contractual delivery timelines.
The data center industry has always had some form of feedback from commissioning back to design to enable iterative improvement for those designs. However, the current speed of deployments combined with the dollars involved require faster feedback, and from all aspects of the team to each other.
In traditional data center construction, large portions of a data hall or data center would be turned over from the construction team to the commissioning team after extensive quality assurance and quality control (QA/QC) activities have been executed. Commissioning occurs in phases, known as levels. Level 1 (L1) focuses on factory acceptance testing to ensure components meet design expectations before shipment, while Level 2 (L2) verifies proper delivery and installation on‑site. Level 3 (L3) conducts pre‑functional checks to confirm individual components and subsystems are ready for operation, followed by Level 4 (L4), which tests full functional performance of each system under various conditions. Finally, Level 5 (L5) performs integrated systems testing, ensuring all systems work together seamlessly under real‑world and failure scenarios to validate overall facility reliability and readiness for operation. The presence of downtime or inactivity on certain systems between L2 and L3, and between L3 and L4, was expected and normal while waiting for large tranches of infrastructure to be ready to test.
Now, more non-traditional approaches to the sequence of construction and commissioning should be considered, including scheduling and turnover of smaller blocks of infrastructure to accelerate testing. This can speed up the diagnosis of systemic issues and help prioritize certain construction and QA/QC activities. However, this also requires a more dynamic approach to the scheduling and handover process, as well as very close coordination and collaboration between the construction and commissioning teams. Specifically, careful consideration should be given to systems or activities that involve new technologies, products, or construction methodologies in determining which activities are low risk to streamline when pressed.
Author Acknowledgements
Back to top
Highlights
- The pace and complexity of AI data center deployment, combined with the business impacts of failures, drive the need for larger commissioning teams focused on quality, speed, and accelerated feedback to the team.
- Design-phase commissioning becomes more critical as the accelerated pace leaves fewer opportunities for adjustments if the CxA is not engaged to review and provide feedback on the design in parallel with development of that design.
- The pace of construction and high penalties for late delivery are forcing construction and commissioning teams to reimagine the process and milestones for handover from construction to commissioning. More granular milestones, which can be harder to schedule and track, enable earlier testing and commissioning but require much closer coordination between teams.
- Earlier focus on construction and commissioning of monitoring and control systems helps to enable faster commissioning by streamlining data capture and analysis during L4 and L5 testing.
- Measurement and Verification (M&V) Plans, which are often treated as almost an afterthought, must start meaningful development at the start of design. Specifically, early implementation of this infrastructure can aid in the identification of issues and determination of corrective actions for new technologies and infrastructure designs.
- Especially in today’s public relations climate regarding AI data centers, understanding, quantifying, and managing impacts to the surrounding community are critical. Matters such as influencing grid power quality, managing power demand, and optimizing water use profiles are all more critical than ever.
Back to top
Discussion
Throughout the design, construction, commissioning, and operations of a data center’s life, there must be seamless integration of requirements and execution for the project to realize its full value proposition. Historically, many data centers have been built as one-offs, or part of a small fleet with significant time between the occupancy of one and the design of the next. In today’s AI factory development market, data centers are frequently designed and built with a modular concept from a prototype that is then repeated numerous times on the same campus or across a corporate portfolio.
To optimize the performance of the portfolio, a rapid feedback loop from commissioning and operations to the design and construction is critical. Lessons learned cannot necessarily wait months for discussion and arbitration to be applied to the next phase, as many times the next phase itself is trailing by only a few months. Larger teams of experienced professionals are often needed to accelerate the feedback loop while continuing to progress the construction, deployment, commissioning, and operations of the facilities.
AI factories, especially liquid-cooled deployments, are particularly susceptible to equipment fouling due to insufficient cleanliness and preparation during construction and startup. Proper cleaning, flushing, and passivating on hydronic systems – especially technology cooling systems (TCS) – should never be compromised in the name of speed. Failure to maintain proper fluid cleanliness can result in extensive project delays if IT equipment cold plates become fouled during deployment or if information technology equipment (ITE) is exposed to fluid leaks due to insufficient rigor during commissioning.
The speed of design, construction, commissioning, and turnover is straining the traditional approaches to commissioning, where construction will turn over large blocks of infrastructure at a given time. Smaller blocks of infrastructure on accelerated timelines, with QA/QC teams barely ahead of commissioning, are becoming more common. To optimize the speed of overall delivery without sacrificing quality, more granular milestones with more aggressive tracking and more intensive coordination between construction and commissioning teams is becoming more commonplace.
Instrumentation and Controls Systems which execute the M&V Plan are often the last systems fully commissioned. However, if they’re brought online early enough in process they can provide the best value for troubleshooting issues as well as recording baseline acceptable performance of the systems during L4 and L5 testing. Having a trended and recorded data set from L4 and L5 commissioning on the Historian server is extremely valuable to the Operations Teams’ ability to validate during occupancy and commercial operations the systems in the facility are operating per design, or not. The more baseline data that can be recorded and stored, the better positioned the Operations Team is to provide highly reliable systems and data center operation.
Operator training, which is typically left for the end of the project when L5 commissioning is largely complete, often times cannot be performed at this point due to project delivery timelines to the end customer. Methods of Procedure (MOPs) are often either not drafted or not validated prior to commercial operations introducing risk into their execution in a live environment. L4 and L5 commissioning activities are typically the single best opportunity for facility operators to see the equipment performing properly, responding to failure scenarios, troubleshooting issues, and validating MOPs without significant risk. No amount of training videos can replace the value of witnessing L4 and L5 testing, working with the startup technicians, and walking through MOPs ahead of commercial operations. The integration of the Operations Team to the commissioning process and the validation of MOPs during commissioning requires commitment from the owner or operator for staffing and resource allocation ahead of the start of revenue generating operations. However, viewing such commitments through the lens of risk mitigation rather than overhead cost can ease the approval process.
Back to top
Recommended Practices