Artificial intelligence (AI) policy: ASHRAE prohibits the entry of content from any ASHRAE publication or related ASHRAE intellectual property (IP) into any AI tool, including but not limited to ChatGPT. Additionally, creating derivative works of ASHRAE IP using AI is also prohibited without express written permission from ASHRAE. For the full AI policy, click here. 

Close
logoShaping Tomorrow’s Global Built Environment Today

Energy and Thermal Efficiency

AI Data Center Energy Performance Framework

Share This

 Back to AI Data Center Energy Performance Framework

 

 Impact

AI data centers demand unprecedented levels of power and cooling, making energy and thermal efficiency central to their viability. The defining characteristic of the shift from traditional to AI data centers is a transition from central processing unit (CPU)–centric to graphics processing unit (GPU)–centric compute.

This inflection point drives novel power density, thermal loads, and interconnection demands, prompting the establishment of new siting, design, and commissioning paradigms and fundamentally reshaping capital investment priorities and operational strategies. By way of this emphasis on GPUs, AI data centers require new frameworks for efficiency, resilience, and performance that extend beyond hardware substitution to encompass the entire ecosystem of energy, cooling, and workload management.  Even as data centers are increasingly engineered around high‑density, GPU‑centric architectures to support AI training and advanced model development, they must still retain the capability to accommodate lower‑density, CPU‑based workloads. This flexibility is essential both during the finite transition away from traditional compute and for the inference phase of the AI lifecycle likely to follow, where many workloads continue to operate at materially lower power densities. In practice, AI‑optimized facilities will need to support a heterogeneous mix of rack densities and cooling profiles to ensure operational continuity, workload portability, and long‑term asset relevance.

This section outlines how advanced cooling strategies, optimized energy use, and integrated sustainability practices reduce operational costs, mitigate risk, and ensure resilience under high-density AI workloads. By embedding these principles into design and operations, data centers can achieve scalable performance while meeting sustainability commitments and regulatory expectations.

Author Acknowledgements

Back to top


Highlights

  1. Optimize Airflow and Thermal Conditions Through Advanced Air Management
    Implement full containment, precise airflow control, and raised supply‑air setpoints aligned with ASHRAE environmental envelopes. These measures reduce fan energy, stabilize inlet temperatures, and expand the hours where passive or low‑energy cooling modes can operate effectively.

  2. Maximize Economization and Low‑Energy Cooling Pathways
    Integrate airside, waterside, and refrigerant‑based economizers to reduce compressor hours and mechanical cooling demand. This approach aligns with ANSI/ASHRAE/IES Standard 90.1 and ANSI/ASHRAE Standard 90.4 expectations for economizer use and delivers significant reductions in overall facility energy consumption.

  3. Adopt Cooling Architectures Purpose‑Built for AI Rack Densities
    Deploy liquid cooling (direct‑to‑chip, rear‑door heat exchangers) and thermally segmented zones to support 50–100+ kW racks while maintaining compliance with ASHRAE Standard 90.4 and the ASHRAE Thermal Guidelines for allowable and recommended classes. These architectures reduce mechanical load, improve heat removal efficiency, and potentially enable operation at higher temperatures.

  4. Leverage Heat Reuse and Energy Recovery to Reduce Net Site Energy
    Utilize warm‑water loops, district heating connections, and high‑grade heat capture to convert waste heat into a usable resource at the data center, associated buildings, or external buildings/sites. These strategies support emerging metrics such as Energy Reuse Effectiveness (ERE) and reduce the environmental footprint of high‑density AI deployments.

  5. Deploy Low‑ or No‑Water Cooling Technologies and Energy Recovery Ventilation
    Incorporate energy recovery ventilator (ERV) systems, dry coolers, and DX‑based economizers to minimize water consumption and improve sustainability in water‑constrained regions. These technologies align with Water Usage Effectiveness (WUE) and Water Usage Impact (WUI) which factors water scarcity targets and support long‑term resource stewardship.

  6. Employ Technology Cooling System (TCS) Liquid Cooling Infrastructure

    For purpose-built AI data centers where compute densities routinely exceed 50–120 kW per rack and have the potential to trend higher, utilize a TCS.  At densities where traditional air‑based cooling becomes insufficient to maintain, a TCS provides the integrated set of transport, and reject heat at scale. Within an energy‑ and thermal‑efficiency framework, a TCS is a primary enabler of high‑density AI workloads, reduced mechanical energy consumption, and expanded opportunities for heat reuse and water‑free cooling.

  7. Use Holistic, Standards‑Aligned Performance Metrics to Drive Continuous Improvement
    Track and report Power Usage Effectiveness (PUE), Water Usage Effectiveness (WUE), Water Usage Impact (WUI), Carbon Usage Effectiveness (CUE), the Data Center Resource Effectiveness (DCRE) server utilization component Information Technology Work Capacity  (ITWC), and other metrics defined by The Green Grid to ensure transparency and guide optimization. These metrics provide a unified lens for evaluating energy, water, and carbon performance across the AI data center lifecycle.

  8. Implement Intelligent Controls, Monitoring, and Continuous Commissioning
    Apply advanced controls, real‑time monitoring, digital twins, and ongoing commissioning to maintain compliance with ASHRAE Standard 90.4 performance requirements and ensure systems operate at peak efficiency. This creates a dynamic, self‑optimizing environment capable of adapting to evolving AI workloads.

Back to top


Discussion

Energy and thermal efficiency in an AI data center refer to the set of design, operational, and technological strategies that minimize energy waste and manage heat effectively while supporting extremely high‑density computing workloads. At its core, energy efficiency focuses on how much useful computational work is delivered per unit of energy consumed, while thermal efficiency addresses how effectively heat is removed or repurposed to maintain safe operating conditions.

Key terminology includes:

  • Power usage effectiveness (PUE), which measures overall facility efficiency;
  • Liquid cooling and direct‑to‑chip cooling, which describe advanced thermal management methods
  • Power density, which is the amount of electrical load concentrated in a given rack or space—now often exceeding 50–100 kW per rack in AI environments

The development that makes this topic so important is the dramatic shift from CPU‑centric to GPU‑centric architectures. AI training clusters generate far more heat, draw significantly more power, and require tighter environmental tolerances than traditional enterprise computing.

This creates a new operational paradigm: cooling systems must be engineered as part of the computing fabric. Grid capacity constraints, sustainability commitments, and rising energy costs further elevate the importance of efficiency. In many regions, the ability to deploy AI infrastructure is limited not by space or capital, but by available local or regional grid capacity (in terms of MW or MVA) and the ability to cool them and stay attuned with the needs of the neighborhoods in which they operate.

The highest impact aspects of energy and thermal efficiency revolve around three areas: scalability, resilience, and sustainability. Efficient thermal management directly influences uptime and performance, ensuring that GPUs can operate at full capability without throttling. Energy‑optimized designs reduce operational costs.

Because AI workloads are growing exponentially, efficiency becomes a gating factor for future expansion—determining how quickly and responsibly organizations can scale their AI capabilities. In short, energy and thermal efficiency are no longer supporting disciplines; they are strategic enablers of AI performance, reliability, and long‑term viability.

Markers of the transition from CPU-centric to GPU-centric compute:

  • Operational pofile: Traditional data centers are optimized for transactional, general-purpose compute functions (web hosting, databases, virtualization) where CPUs excel at serial processing.
  • Power density: GPU clusters drive much higher rack-level power densities (often 40–100 kW per rack vs. 5–10 kW in legacy CPU racks), fundamentally changing cooling and energy strategies.
  • AI workloads: Training and inference for large models demand massive parallelism, which GPUs (and increasingly tensor processing units [TPUs] or custom accelerators) deliver far more efficiently.
  • Interconnections: AI data centers rely on high-bandwidth, low-latency fabrics (20 GB/s to 100 GB/s per lane), and liquid cooling integration to keep GPUs operating, whereas CPU-centric centers leverage Ethernet standards and storage networks.

Back to top


Recommended Practices

  • 1. Optimize air management as a foundational control

    Recommendation:
    Implement fundamental air management: Hot/cold aisle layout, containment, and minimized bypass/recirculation to reduce fan energy and increase ΔT across coils.

    Best practices:

    • Hot/cold aisle configuration and containment: Use full or partial containment to prevent supply–return short‑circuiting and stabilize inlet temperatures.
    • Right-size airflow: Avoid oversupplying airflow; tune CRAH/CRAC fan speeds and IT fan control to actual IT load using VFDs and BMS control loops by design.
    • Raise supply air temperature: Once containment and monitoring are in place, raise supply/inlet temperatures within ASHRAE TC9.9 recommended ranges to improve cooling system efficiency and maximize economizer hours.
    • Continuous monitoring: Deploy granular sensors at rack inlets (not just room-level) and integrate into DCIM/BMS for control and alarms.

    Rationale: Before adding advanced systems, eliminate waste in air distribution and mixing.

  • 2. Maximize economization and free cooling (airside, waterside, refrigerant)

    Recommendation:
    Integrate economization as a fundamental design strategy with climate-zone appropriate solutions: airside, waterside, and refrigerant-based free cooling.

    Best practices:

    • Airside economizers: Use filtered outdoor air when temperature and humidity are within acceptable envelopes; design filtration and corrosion control per ASHRAE guidance.
    • Waterside economizers / dry coolers: Use fluid coolers or dry coolers to reject heat without chiller compressors when ambient temperature permits. In colder climates, this can carry much of the annual load.
    • Refrigerant free-cooling DX systems: For sites without chilled water, use DX systems with economizer modes that bypass or reduce compressor operation when outdoor conditions allow.
    • Control integration: Tie economizers to predictive controls that consider forecasted weather, IT load, and redundancy to prioritize free cooling over mechanical compression.

    Rationale: Reduce mechanical cooling hours and compressor energy by optimizing around ambient conditions.

  • 3. Align cooling architecture with AI rack densities

    Recommendation:
    Adopt liquid or liquid-assisted architectures (rear-door heat exchangers, direct-to-chip, immersion) for AI clusters while retaining air for lower-density zones. Direct‑to‑chip liquid is an inherently more efficient heat transfer medium (more effective than air at capturing heat), enabling high-density racks with lower cooling energy and reduced reliance on CRAC/CRAH units.

    Best practices:

      • Segment AI and non‑AI loads: Physically and logically separate AI/HPC halls from general computing functions to allow different setpoints, cooling, and redundancy approaches.
      • Use ASHRAE thermal classes: Design inlet conditions around ASHRAE Thermal Guidelines to widen allowable temperature/humidity bands and reduce overcooling.
      • Design for modular scalability: Use manifolded liquid distribution (CDUs, secondary loops) that can scale with incremental AI deployments instead of rebuilding air systems for each density step.

    Rationale: Match cooling topology to AI density class thermal loads (60–120 kW/rack and above) rather than legacy 5–15 kW density baselines.

  • 4. Deploy heat reuse and energy recovery strategies

    Recommendation:
    Pursue heat recovery where there is a viable sink (district heating, nearby buildings, on-site process loads, or functions that require heat input) and track performance via metrics like ERE and Energy Reuse Factor (ERF).

    Best practices:

    • High-grade heat from liquid cooling: Direct-to-chip and warm-water liquid loops provide higher-grade heat (higher supply/return temperatures) that’s more suitable for reuse than low-grade air exhaust.
    • District or campus integration: Where feasible, connect to campus loops for building heating, domestic hot water pre-heat, or absorption chilling.
    • Energy reuse metrics: Use ERE/ERF alongside PUE to capture how much useful energy is exported rather than solely minimizing facility energy.
    • Design for future reuse: Even if a heat sink doesn’t exist at day one, design headers, temperatures, and isolation so reuse can be added later with minimal disruption and at a lower cost.

    Rationale: Convert “waste” heat from AI loads into usable thermal energy, improving overall site and district efficiency.

  • 5. Use energy recovery ventilation and low/no‑water air treatment

    Recommendation:
    Apply energy recovery ventilation (ERV) where outdoor air is required (offices, support spaces, or airside economizer systems), using enthalpy wheels or plates to transfer heat and, where applicable, moisture between exhaust and intake airstreams.

    Best practices:

    • ERV in mixed-use spaces: Data halls often use low outdoor air (OA) fractions, but office and support areas benefit from ERV to minimize HVAC loads.
    • Pair ERV with airside economizers: Recover energy when OA is less favorable; bypass when conditions are in the economizer window.
    • Low/no‑water cooling technologies:
      • Refrigerant-based DX with economizer modes (no tower water).
      • Closed-loop dry coolers instead of open cooling towers to reduce or eliminate water use.
      • Indirect evaporative cooling with very low water usage where water is acceptable but must be tightly managed; limit to climates where WUE and risk tradeoffs are acceptable.

    Rationale: Improve ventilation and air treatment efficiency without increasing water consumption.

  • 6. Employ technology cooling systems (TCS) for liquid cooling for purpose-built AI data centers with high compute densities

    Recommendation:
    Align TCS design with data center density roadmap and sustainability objectives. Different liquid-cooling architectures impose distinct TCS design considerations:  

    • Direct-to Chip (D2C): Supports warm-water cooling and high economization hours.
    • Rear-Door Heat Exchangers (RDHx): Hybrid approach that reduces room heat load short of full liquid adoption.
    • Immersion Cooling: Requires compatibility with dielectric fluids and tank-integrated heat exchangers and offers the higher heat-reuse potential.

    Understanding these distinctions is beneficial to operators in aligning TCS design with critical production space densification plans.

    Best practices:

    • System Architecture and Functional Scope: A complete TCS encompasses both IT‑side and facility‑side liquid loops, along with the mechanical and digital controls that coordinate them. Core elements include:
      • Coolant Distribution Units (CDUs): Interface between IT loops and facility water systems, providing heat exchange, pumping, and temperature control.
      • Direct‑to‑Chip (D2C) and Immersion Cooling Interfaces: Cold plates, manifolds, immersion tanks, and associated sensors.
      • Rear‑Door Heat Exchangers (RDHx): Hybrid air–liquid systems that reduce room‑level heat loads.
      • Pumps, Valves, and Piping Networks: Variable‑speed pumping, isolation valves, and redundant distribution paths.
      • Heat Rejection Systems: Dry coolers, adiabatic systems, heat‑pump integration, and district‑heating interfaces.
      • Instrumentation and Controls: Temperature, pressure, flow, and leak‑detection sensors integrated with BMS/DCIM platforms.

    This system‑level view ensures that TCS is treated not as a point solution but as a coordinated thermal ecosystem.

    • Controls, Automation, and Optimization. Modern TCS performance is reliant on intelligent control strategies:
      • Variable‑Speed Pumping: Matches pump energy to real‑time thermal load.
      • Differential Pressure and Flow Control: Ensures stable operation across diverse rack densities.
      • Supply Temperature Optimization: Maximizes free‑cooling hours and minimizes compressor use.
      • Predictive Thermal Modeling: Anticipates load spikes from AI training cycles.
      • AI‑Driven Thermal Orchestration: Integrates workload placement with thermal zones to reduce mechanical energy.

    These capabilities transform TCS from a passive cooling system into an active efficiency engine.

    • Reliability and Redundancy. AI data centers require mission‑critical reliability. TCS designs should incorporate:
      • N+1 or 2N pumping and heat‑exchange redundancy
      • Dual IT and facility loops for isolation and maintenance
      • Continuous leak detection and containment
      • Redundant sensors and telemetry paths
      • Emergency heat rejection strategies (e.g., dry coolers on backup power)

    These measures ensure thermal stability during failures, maintenance, or power transitions.

    • High-Temperature Operation and Water Efficiency. Liquid cooling enables high‑temperature operation that reduces or eliminates evaporative cooling:
      • Dry Cooler Integration: Supports water‑free heat rejection in many climates.
      • Reduced WUE, WUI: Minimizes or eliminates evaporative tower use.
      • Climate‑Responsive Operation: Adjusts supply temperatures to maximize water savings.
      • Heat‑Pump Integration: Enables heat reuse without water consumption.

    This aligns with sustainability goals and provisions for shifting regulatory expectations around water stewardship.

    • Heat Reuse Enablement. TCS is the enabling layer for several practical heat‑reuse strategies (see Recommendation 4):
      • High‑Temperature Outlet Water: Supports district heating, domestic hot water, and industrial processes.
      • Secondary Heat‑Reuse Loops: Provide hydraulic and thermal separation.
      • Temperature Lift Considerations: Evaluate pump energy vs. reuse value.
      • Standards Alignment: EN 50600‑4‑6 and ISO/IEC 30134‑6.

    As heat reuse is a potential sustainability metric, TCS design directly influences an operator’s ability to implement and track it.

    • Future-Proofing. As AI workloads continue to scale, TCS should include provisions to support:
      • Increasingly higher rack densities (exceeding 200–300 kW).
      • Modular, skid‑based TCS deployments for rapid expansion.
      • Integration with low‑GWP refrigerant systems and electrified heat pumps.
      • AI‑driven thermal orchestration across entire campuses.
      • Full transition from hybrid to predominantly liquid‑cooled environments.

    These trends position TCS as a strategic investment that shapes the long‑term efficiency and sustainability profile of AI data centers.

    Rationale: Liquid cooling delivers significant efficiency advantages over air‑based systems, including:

    • Higher Heat‑Transfer Efficiency
    • Elevated Supply Temperatures
    • Reduced Mechanical Load
    • Improved IT Efficiency

    These characteristics position TCS as a foundational strategy for achieving low PUE, high thermal efficiency, and predictable performance at scale.

  • 7. Design for holistic energy performance metrics (PUE, WUE, WUI, DCRE, CUE, IT utilization, ITWC)

    Recommendation:
    Adopt a metric stack that ties thermal decisions to global performance: PUE for energy, WUE for water, and CUE for carbon, in conjunction with IT utilization and server efficiency metrics.

    Best practices:

    • PUE as a control target: Use near-real-time PUE tracking and trend analysis to identify impacts of setpoint changes, economizer operation, and liquid-cooling adoption.
    • WUE, WUI  to inform water constrain strategies: Where water is constrained, use WUI to bound evaporative approaches and favor dry solutions.
    • IT-first efficiency: Combine thermal management with server refresh, consolidation, and power management to avoid efficiently cooling underutilized hardware.
    • Tie to AI workload patterns: Align cooling capacity and controls with known AI training/inference cycles and scheduling to avoid overprovisioning for short-lived peaks.

    Rationale: Ensure cooling and thermal strategies support overall energy performance and enterprise sustainability objectives, not just localized efficiency.

  • 8. Integrate controls, monitoring, and digital twin for continuous optimization

    Recommendation:
    Leverage advanced controls and modeling to tune cooling systems continuously as AI loads evolve.

    Best practices:

    • Model-based design: Use computational fluid dynamics (CFD) and energy modeling in the design phase to validate aisle configuration, supply temperatures, and liquid-routing strategies under AI‑class densities.
    • Automated control sequences: Implement supply-air and water-temperature reset, fan-speed optimization, and dynamic economizer enablement.
    • Digital twin / ongoing calibration: Use a digital twin or simulator to test operational changes virtually before deployment and to validate design assumptions as loads grow and shift.
    • Continuous commissioning: Integrate periodic recommissioning of cooling and controls into standard operating procedures.

    Rationale: Treat energy efficiency as an ongoing operational program, not a one-time design.

Back to top


Where to Learn More

AI Data Center Energy & Thermal Efficiency — Standards Mapping Matrix

This matrix maps each recommendation outlined above to the relevant ASHRAE Standard 90.4, ASHRAE TC 9.9 Thermal Guidelines, DOE Best Practices, and The Green Grid (TGG) and other standards/metrics.

Recommendation Standard 90.4 ASHRAE TC 9.9 Thermal Guidelines DOE Best Practices TGG/Other Standards

1. Cooling architecture aligned with AI rack densities (liquid cooling, segmentation, thermal classes)

Supports lower mechanical load component (MLC) through reduced fan power and efficient heat removal; recognizes liquid cooling as a pathway to compliance

Defines thermal classes and allowable inlet conditions enabling warm‑water liquid cooling

Recommends liquid cooling for high‑density AI/HPC to reduce mechanical energy

Improves PUE, supports CER, and enables higher‑grade heat for ERE/ERF
TGG DCRE metric

2. Air management optimization (containment, airflow control, raised setpoints)

Good air management is assumed for achieving MLC; supports economizer operation

Provides recommended/allowable temperature and humidity ranges enabling higher setpoints

Identifies containment, airflow tuning, and supply‑air reset as foundational efficiency practices

Direct lever for improving PUE and cooling sub‑metrics

3. Economization and free cooling (airside, waterside, refrigerant)

Economizer strategies reduce MLC

Environmental envelopes enable safe use of airside economizers

Strongly promotes airside/waterside economizers to reduce compressor hours

Improves PUE; supports heat‑reuse metrics when paired with recovery

4. Heat reuse and energy recovery (district heating, warm‑water loops)

Allows credits for heat recovery and shared-space economizers

Higher liquid temperatures align with allowable inlet ranges

Encourages heat recovery to reduce net site energy

Uses ERE and ERF to quantify beneficial heat reuse

5. Energy recovery ventilation and low/no‑water cooling (ERV, dry coolers, DX economizer)

Supports efficient HVAC/heat rejection

Defines humidity/temperature envelopes that reduce evaporative dependence

Recommends ERV and dry cooling for low‑water, high‑efficiency operation

Improves WUE; supports PUE stability in water‑constrained designs
TGG WUI metric

6. Employ technology cooling systems (TCS) for liquid cooling for purpose-built AI data centers for high compute densities

MLC and liquid-cooling efficiency pathways

Liquid‑cooling guidelines and environmental envelopes (e.g., application of W Classes)

Separation of FWS from TCS

Warm Water Operation / High ΔT

Integration of control, monitoring, and water quality management

ISO/IEC 30134 Series: PUE, WUE, HRE, and other KPI definitions

EN 50600: European data center design and operational standards

TGG: foundational efficiency metrics and best practices

7. Holistic performance metrics (PUE, WUE, CUE, WUI, DCRE)

Complements MLC/ELC with operational metrics

Ensures environmental conditions align with IT reliability

Promotes PUE, CUE, and IT utilization as core KPIs

TGG is originator of PUE, WUE, WUI, CUE, ERE, ERF, and DCRE

8. Controls, monitoring, modeling, digital twin, continuous commissioning

Efficient control sequences are required to maintain MLC/ELC compliance

Monitoring ensures adherence to thermal envelopes

Emphasizes continuous commissioning, EMCS, and modeling

Monitoring required for accurate PUE/WUE/CUE/ERE/ERF reporting

Back to top


Case Studies

MIT Lincoln Laboratory Supercomputing Center (LLSC)

The MIT Lincoln Laboratory Supercomputing Center (LLSC) is a purpose‑built high‑performance computing (HPC) facility designed to support AI, modeling, and advanced analytics workloads. Its architecture reflects a shift from traditional enterprise data centers to a high‑density, GPU‑centric environment, with infrastructure engineered around energy efficiency, thermal performance, and operational flexibility. The facility combines high‑density compute clusters with resilient power and cooling systems that can evolve as AI rack densities and workload profiles grow over time.

LLSC emphasizes a “cooling‑first” design approach that integrates liquid‑ready infrastructure, optimized airflow, and warm‑temperature operation to reduce mechanical cooling energy. Air management features such as hot/cold aisle containment, supply‑air setpoint optimization, and close‑coupled cooling are paired with economization strategies where climate or seasonality permits. The center uses comprehensive monitoring and controls to continuously tune performance, leveraging real‑time data from ITE, mechanical, and electrical systems to maintain efficiency under dynamic AI workloads.

Holistic performance management is central to the LLSC design philosophy. The facility tracks power at multiple levels, focuses on workload utilization, and uses key performance indicators such as PUE, water use, and capacity utilization to guide operational decisions. That combination of architectural choices, thermal strategy, and data‑driven operations makes LLSC a compelling reference model for AI data centers seeking to balance density, performance, and operational costs.

Meta AI Research SuperCluster (RSC)

Meta’s AI Research SuperCluster (RSC) is a large‑scale AI training environment built to support foundation models, classification systems, and generative AI at hyperscale. It represents Meta’s pivot from CPU‑centric, air‑cooled web workloads to GPU‑dense clusters that demand radically higher rack power, network bandwidth, and thermal capacity.1, 2

RSC is part of a broader redesign of Meta’s data center platform: existing campuses are being “rescoped” for AI, and new builds are engineered from the ground up for liquid‑ready, high‑density infrastructure. The facilities integrate direct‑to‑chip liquid cooling for GPU servers, hybridized with air cooling for traditional x86 and storage workloads, allowing Meta to scale AI capacity without stranding legacy compute. 1, 2

Thermally, Meta’s next‑generation AI data centers are designed around high rack densities (moving from ~20 kW to well over 100 kW per rack) and the need for tightly coupled, low‑latency GPU fabrics. This drives a “cooling‑first” architecture that combines liquid cooling, optimized airflow for remaining air‑cooled loads, and evolving approaches to low‑ or no‑water heat rejection aligned with Meta’s water‑positive and sustainability goals.  2, 3

Operationally, Meta is using AI‑optimized controls, extensive telemetry, and iterative redesign of its campuses to match rapidly changing AI workloads. Construction pauses and redesigns in 2022–2023 were explicitly used to re‑scope facilities around GPU clusters, liquid cooling, and 24–32× increases in networking capacity—embedding performance, efficiency, and flexibility into the platform rather than treating AI as a bolt‑on. 

Comparison table: LLSC and Meta AI RSC Implementation vs Best Practices

Recommendation / Best Practice LLSC Implementation Meta AI RSC Implementation

Cooling architecture aligned with AI rack densities

Designed for high‑density HPC/AI racks with liquid‑ready and close‑coupled cooling options to support current and future GPU clusters.

Re-design of data centers around GPU‑dense clusters, with partially liquid‑cooled architecture (direct‑to‑chip for GPUs) and high‑density power/networking to support large AI training fabrics.1,2

Air management optimization

Uses structured hot/cold aisle layout, containment, and controlled airflow paths to stabilize inlet temperatures and reduce fan energy.

Retains air cooling for traditional x86 and storage tiers, with data halls laid out to segregate AI liquid‑cooled zones from air‑cooled infrastructure, enabling targeted airflow management and fan energy control for non‑GPU loads. 2

Economization and free cooling

Integrates climate‑appropriate economization strategies (e.g., waterside/airside where viable) to reduce compressor runtime and overall cooling energy.

New campuses (e.g., dry‑cooled sites) are designed to minimize or eliminate cooling water use, leveraging climate‑appropriate dry cooling and high‑efficiency heat rejection to reduce compressor and chiller hours while meeting AI thermal loads. 3

Heat reuse & energy recovery

Operates with elevated coolant/air temperatures compatible with future heat recovery or warm‑water reuse strategies at campus/district level.

Liquid cooling and higher‑temperature loops for GPU cold plates create a pathway for future warm‑water heat recovery or district‑scale reuse (Inference based on liquid and warm‑loop design trends in Meta’s next‑gen facilities.) 1, 3

Energy recovery ventilation and low/no‑water cooling

Evaluates low‑water cooling options and energy‑aware ventilation strategies consistent with long‑term sustainability and resiliency objectives.

AI‑optimized campuses include “dry‑cooling” concepts to achieve zero water demand for cooling at some sites, supporting Meta’s 2030 water‑positive goal and reducing dependence on evaporative systems. 3

TCS for liquid cooling

N/A

Standardizing direct‑to‑chip liquid cooling for GPUs as a core technology cooling system, with supply‑water temperature strategies and distribution designs tuned for rising rack densities and future liquid‑cooled hardware generations. 1, 2

Holistic performance metrics (PUE, WUE, CUE, utilization)

Monitors power at IT and facility levels, tracks PUE and utilization, and uses these metrics as operational levers for continuous optimization.

Publicly ties data center design to energy and water goals (including water‑positive by 2030), using efficiency metrics and resource‑use KPIs to guide shifts from evaporative to dry cooling and from air to liquid cooling across its AI fleet. 1, 3

Controls, monitoring, modeling, digital twin, continuous commissioning

Employs centralized monitoring, advanced controls, and ongoing tuning/commissioning to align mechanical operation with rapidly changing AI/HPC workloads.

RSC and next‑gen AI data centers are the result of iterative re-design, with Meta pausing and rescoping projects to align mechanical, electrical, and network systems with AI workloads, and using AI‑driven planning and telemetry to continuously refine design and operations. 1, 2

1 Report: Meta Plans Shift to Liquid Cooling in AI-Centric Data Center Redesign | Data Center Frontier

2 How Meta redesigned its data centers for the AI era - DCD

3 Meta’s Liquid Cooling 2025: Inside the $65B AI Overhaul - EnkiAI

Back to top

Close