Artificial intelligence (AI) policy: ASHRAE prohibits the entry of content from any ASHRAE publication or related ASHRAE intellectual property (IP) into any AI tool, including but not limited to ChatGPT. Additionally, creating derivative works of ASHRAE IP using AI is also prohibited without express written permission from ASHRAE. For the full AI policy, click here. 

Close
logoShaping Tomorrow’s Global Built Environment Today

Operations and Maintenance

AI Data Center Energy Performance Framework

Share This

 Back to AI Data Center Energy Performance Framework

Operations and Maintenance  

 Impact

Operations and maintenance impact reliability, efficiency, and security in AI data centers by using telemetry for predictive analytics and automation. This can reduce downtime and optimize resource utilization which can lead to extended equipment life and lower costs

Clear separation of responsibilities between facilities personnel and artificial intelligence/machine-learning (AI/ML) tools strengthens operational reliability and accountability. While AI-driven analytics improve visibility into system performance and failure risk, facilities teams remain responsible for procedural execution, safety, compliance, and operational decision making. Integrating commissioning data, documented procedures, and standards-based operating limits into AI-supported operations reduces unintended consequences and supports consistent outcomes.

Author Acknowledgements

Back to top


Highlights

  1. Deploy real-time monitoring and anomaly detection for critical systems.
  2. Implement AI/ML-based predictive maintenance to minimize unplanned outages.
  3. Design for continuous protection of critical infrastructure from cyber threats and physical threats.
  4. Develop and implement a comprehensive disaster resilience plan.
  5. Establish continuous training programs aligned with industry certifications.
  6. Define and document operational roles that distinguish human responsibilities from AI/ML-supported functions.
  7. Integrate commissioning and re-commissioning data into operational baselines and AI/ML analytics.
  8. Implement formal MOPs and SOPs aligned with control system logic and AI-generated alerts.
  9. Align operational setpoints and control strategies with ASHRAE TC 9.9 guidance and applicable codes and standards.

Back to top


Discussion

Operations and maintenance in AI data centers include strategies for reliability, efficiency, and security. These data centers use AI/ML technologies for predictive maintenance, real-time monitoring, and automated repair. This enables data center operators and owners to optimize energy usage as well as anticipate and quickly fix failures.

Integrating cybersecurity and physical reliability measures into daily operations is increasingly important. As data centers become more automated and interconnected, these facilities face heightened risks from cyber threats, physical threats, and hazards from the environment. Incorporating security protocols, redundant systems, and disaster resilience planning maintains continuity even during events such as power disruptions, earthquakes, tornadoes, and hurricanes.

Workforce development is essential to support operations and maintenance. Training programs focused on AI/ML-enabled tools, cybersecurity practices, and emergency response procedures give personnel the skills and tools to manage complex systems.

AI data centers require operations and maintenance programs that balance analytics with disciplined facilities practices. AI/ML systems provide continuous monitoring, anomaly detection, and predictive insights, but rely on accurate inputs, established operating envelopes, and structured human oversight to remain effective. Facilities personnel retain accountability for interpreting results, authorizing actions, and executing maintenance activities safely and correctly.

Back to top


Recommended Practices

  • 1. Monitoring and telemetry

    Use real-time sensor data from power and cooling devices to establish a baseline, to optimize electrical power consumption, and to detect deviations. Consider using a digital twin to simulate operation and maintenance scenarios for proactive planning.

  • 2. Predictive maintenance

    Establish thresholds based on the telemetry data that can be used to predict component failures using AI/ML. Alerts can be set up for notification of likely part failure and to trigger replacement part ordering when the probability crosses a certain boundary.

  • 3. Cybersecurity and physical reliability

    Implement software and network security protocols combined with physical safeguards (e.g., data center location identification, data center access) to mitigate hacking and deliberate sabotage risks.

  • 4. Disaster recovery planning

    Do a geographic risk assessment and have a plan for natural disasters (e.g., earthquakes, tornadoes, and hurricanes) that can affect data center operations by threatening physical infrastructure, electrical power continuity, and accessibility. Regularly test emergency response procedures and integrate predictive weather forecasts, watches, and warnings into operations and maintenance planning.

  • 5. Training and workforce development

    Create and keep up to date a training and workforce development curriculum that is focused on the deployed maintenance tools, cybersecurity protocols, and emergency response procedures. Include certifications aligned with industry standards (e.g., Uptime Institute, ANSI/BICSI, IFMA).

  • 6. Human versus AI roles

    Document the division of responsibility between facilities personnel (approval, execution, compliance, safety) and AI/ML systems (monitoring, prediction, optimization recommendations).

  • 7. Commissioning integration

    Use initial commissioning and re-commissioning results to define operational baselines and validate AI/ML model inputs, updating baselines after significant system upgrades, additions or changes.

  • 8. MOPs and SOPs

    Develop, maintain, and periodically review documented procedures for routine operations, maintenance events, abnormal conditions, and alarm responses.

  • 9. Standards alignment

    Verify that AI-driven optimization and facility control strategies operate within limits defined by ASHRAE TC 9.9 and related standards.

Close