Retrofit & Modernization Strategies

AI Data Center Energy Performance Framework

Back to AI Data Center Energy Performance Framework

Impact

The rise of AI workloads fundamentally disrupts traditional data center design by pushing rack densities beyond the limits of air cooling (>100 kW/rack) and introducing massive, synchronized power spikes that threaten to trip legacy breakers. Operations must shift from static capacity planning to dynamic “digital twin” simulations to manage these volatile loads without compromising uptime.

This topic is critical because most existing enterprise data centers were built for steady-state, low-density workloads; without strategic retrofitting, these facilities face immediate obsolescence or catastrophic failure when tasked with modern AI training or inference operations.

Retrofit strategy is now mission‑critical in the AI era because most legacy data centers were never designed to accommodate the extreme densities, liquid‑cooling requirements, and synchronous power behavior of modern AI systems, yet these buildings must remain operational throughout the upgrade process. Unlike new construction, AI retrofits impose tight constraints since they must be executed within live environments and around legacy equipment, outdated documentation, and teams accustomed to traditional operating modes. As a result, AI retrofits often fail operationally rather than technically because even though the engineering solutions might be sound in theory, the retrofit introduces new procedures, unfamiliar failure modes, and steep learning curves for staff. These operators may not have experience with liquid cooling, ultra‑dense GPU clusters, or electrical design‑point transients. As a result, operator and process challenges such as the need for new maintenance sequences, risk‑mitigation steps, and rapid-response protocols can undermine otherwise solid engineering plans. Successful modernization therefore requires not just upgrades to power and cooling infrastructure, but proactive workforce upskilling and integrated commissioning practices (see the Commissioning and Performance Validation section) that validate procedures and operator readiness as thoroughly as the hardware itself.

The purpose of this section is to quantify the “capability gap” between legacy infrastructure and AI requirements, establishing the urgent business and technical case for the specific engineering retrofits detailed in this section.

Author Acknowledgements

Highlights

Adopt a Hybrid Cooling Architecture
Transitioning a data center facility to AI does not require abandoning existing air-cooling investments. The most effective strategy is a hybrid approach: deploying direct-to-chip (DTC) liquid cooling to handle the intense heat of GPU processors (often >100 kW/rack), while maintaining legacy air-cooling systems (CRAC/CRAH) to manage the remaining 10-30% of heat generated by memory, power supplies, storage, and networking gear.

Engineer for “Synchronous” Power Volatility
Legacy power systems were designed for steady-state averages, not the volatile “heartbeat” of AI training. Operators must retrofit power infrastructure to handle electrical design point (EDP)—transient spikes where chips draw up to 50% above their rated power. This requires ensuring switchgear and UPS systems have sufficient “headroom” or deploying local energy storage solutions to buffer these millisecond step-loads.

Fortify Structural Integrity for Ultra-Density
The physical weight of AI infrastructure is a critical, often overlooked constraint. With fully loaded liquid-cooled racks exceeding 1,800 kg (4,000 lb), facilities must undergo structural audits. Retrofits often require reinforcing sub-floors with heavy-duty stringers or load-distributing plates to prevent raised-floor collapse.

Transition to “Digital Twin” Operations
As density increases, the margin for error disappears. Operators should move away from static capacity planning (spreadsheets) toward digital twin software. This allows for the simulation of failure scenarios and power spikes in a virtual environment before physical deployment, ensuring that breaker coordination and cooling loops remain stable under stress.

Discussion

Data center modernization and retrofitting refers to the strategic process of upgrading existing facility infrastructure—power, cooling, and structural elements—to support next-generation workloads without constructing a new building from scratch.

In the context of artificial intelligence (AI), this often involves transforming general-purpose compute environments into specialized “AI factories.” A critical distinction in this domain is understanding the two primary types of AI workloads: training, which involves massive, sustained computational intensity to build models, and inference, which runs established models to generate outputs. While training requires the highest densities (often necessitating 100 kW+ per rack), inference workloads can often be integrated into existing enterprise facilities with more modest retrofits.

The essential context for understanding this topic is the “capability gap” between legacy designs and modern realities. Most existing data centers were engineered for “asynchronous” workloads, where server activity is random and averages out over time. AI workloads, conversely, are highly “synchronous”; thousands of GPUs often spike in unison, creating massive step-loads that can destabilize standard electrical systems.

Furthermore, traditional facilities were typically designed for rack densities of 5-10 kW using air cooling. Modern AI hardware now pushes densities well beyond the thermal limits of air, making the integration of direct-to-chip (DTC) liquid cooling and high-voltage power distribution not just efficient upgrades, but operational necessities to prevent equipment failure. Liquid cooling can be challenging to design and operate as a system and needs to be a well understood impact to any modernization plan.

The most high-impact aspects of this transition are density and power volatility. Unlike standard server refreshes, AI retrofits force a fundamental rethinking of the “white space.” Operators must grapple with electrical design point (EDP), a phenomenon where AI chips briefly draw up to 50% more power than their thermal rating, requiring power infrastructure (switchgear and UPS) with significant “headroom” or specialized buffering capabilities.

Additionally, the physical weight of these high-density racks—often exceeding 1,800 kg (4,000 lb) because of fluids, heavy piping, and heat sinks—challenges the structural integrity of legacy raised floors, forcing operators to reinforce sub-floors or deploy weight-distributing plates.

Recommended Practices

1. Migrate to 415 V Distribution

Move away from legacy 120/208 V architectures to 230/400 V or 240/415 V distribution. This reduces the sheer volume of cabling required for high-density racks and improves electrical efficiency by eliminating upstream transformers, a critical step when retrofitting constrained spaces. If part of the retrofit includes upgrades to electrical service or where modular spaces exist to expand the data center, consider an upgrade to 800 V DC power infrastructure. This will depend on the ITE in the data center and its enterprise business objective (see the Integrated Design section).
2. Implement Active Harmonic Filtering

Because liquid cooling coolant distribution units (CDUs) utilize variable frequency drives (VFDs) for pumps, they can introduce significant harmonic distortion. Install active harmonic filters or isolate CDUs on separate UPS systems to prevent these harmonics from disrupting sensitive IT load or overheating neutral conductors.
3. Deploy Local Energy Storage for Peak Shaving

To manage the EDP—where AI chips spike up to 50% above their thermal rating for milliseconds—deploy fast-response energy storage (e.g., supercapacitors or BESS) downstream of the UPS. This cutting-edge practice prevents operators from having to massively oversize the entire generator/UPS plant just to handle transient millisecond spikes. An additional advantage to employing this strategy is enabling local grid stability during peak use.
4. Restrict Fault Currents for Safety

When upsizing transformers to support higher loads, impedance typically drops, raising fault currents to dangerous levels. Verify that retrofitted systems utilize high-impedance transformers or current-limiting circuit breakers to keep fault currents below 10 kA, ensuring arc flash safety for personnel.
5. Adopt a Hybrid Cooling Strategy
Do not attempt to cool high-density AI clusters (e.g., >50 kW/rack) solely with air. Implement DTC liquid cooling for processors while retaining legacy CRAC/CRAH units to manage the residual 10-30% of heat generated by power supplies, memory, storage, and networking gear. Liquid-to-air CDUs may offer an upgrade path to legacy facilities, allowing the facility to make the most of its existing infrastructure, but this is not recommended at scale for efficiency reasons.
6. Reject Heat
When modernizing legacy facilities for AI workloads, operators should evaluate the full range of existing heat rejection options and select upgrades that maximize efficiency within local climate constraints. In cooler or mixed climates, installing new high‑temperature (HT) chillers or dry coolers enables operation at elevated water temperatures, which greatly expands the number of free‑cooling hours and reduces reliance on mechanical refrigeration. In warmer climates, dry coolers may require adiabatic assist, while HT chillers provide reliable performance during peak ambient conditions. The most efficient retrofit strategy is typically a hybrid configuration where the system uses dry coolers for the majority of the year and allows HT chillers to operate only when outdoor temperatures exceed free‑cooling limits. This strategy minimizes energy use while keeping year‑round thermal stability.

Equipment can operate outside the recommended ranges provided in the ASHRAE TC 9.9 Datacom Encyclopedia. By moving to hotter and wetter allowable conditions (while staying in A1) we can increase the number of economizer hours per year. Turning off the compressors in CRACs and chillers will drastically decrease the energy consumption of the entire system and allow the outdoor heat rejection system for waterside economizers to provide total or partial cooling for more hours throughout the year. Oversizing dry coolers will also extend the number of economizer hours per year though an OPEX/CAPEX evaluation would be needed to justify this.
- Oversized air-cooled condensers result in lower head pressure for the compressors in CRACs and therefore allow for some energy savings (the compressors do not have to work as hard for any given capacity). Likewise micro-channel air-cooled condensers can lower head pressure for a given footprint due to their efficient design.
- V-frame or A-frame design of dry coolers allow for greater heat transfer surface area for any given footprint, potentially leading to increased economizer hours.
- Using evaporatively assisted air-cooled condensers and dry coolers in areas where water is readily available can also increase economizer hours as well as reduce the power to the condenser/dry cooler fans.
7. Utilize Floor-Mounted Manifolds for Retrofits
When retrofitting existing raised-floor environments, prioritize under floor mounted TCS piping instead of overhead mounted options. This minimizes the need for complex overhead structural work and simplifies the connection to the facility water loop (FWS).
8. Consistently Use Wide/Deep Racks
Replace legacy server cabinets with racks meeting a minimum dimension of 750 mm wide x 1200 mm deep. This extra volume is non-negotiable for accommodating the manifolds, PDUs, and thicker power cables required for AI servers.
9. Reinforce Sub-Floors for Point Loads

Conduct a structural engineering review of the raised floor system. For racks exceeding 1,500 kg (approx. 3,300 lb), retrofitting may require the installation of heavy-duty stringers, adjustable pedestals, or load-distributing steel plates to prevent floor collapse under the weight of liquid-cooled clusters.
10. Validate via “Digital Twin” Simulation

Move beyond static spreadsheets for capacity planning. Before deployment, build a physics-based “digital twin” of the electrical and cooling topology. Use this to simulate AI step-load scenarios (e.g., 0% to 100% load in microseconds) to verify that breaker coordination and cooling loops remain stable under synchronous stress.
11. Audit “As-Built” Documentation
Prior to any retrofit design, physically trace and verify all breaker interconnections and piping. Do not rely on existing “as-built” drawings, which are frequently outdated in legacy facilities and can lead to catastrophic cascading failures if redundancies are not wired as documented. Proper commissioning (see the Commissioning and Performance Validation section) ensures systems function as designed.