Simulator-based reinforcement learning for data center cooling optimization

We’re sharing extra concerning the position that reinforcement studying performs in serving to us optimize our information facilities’ environmental controls.
Our reinforcement learning-based method has helped us cut back vitality consumption and water utilization throughout varied climate circumstances in our information facilities.
Meta is revamping its new information heart design to optimize for synthetic intelligence and the identical methodology will likely be relevant for future information heart optimizations as properly.

Effectivity is among the key parts of Meta’s method to designing, building, and operating sustainable data centers. Apart from the IT load, cooling is the first client of vitality and water within the information heart surroundings. Enhancing the cooling effectivity helps cut back our vitality use, water use, and greenhouse fuel (GHG) emissions and in addition helps deal with one of many biggest challenges of all – local weather change.

Most of Meta’s current information facilities use outside air and evaporative cooling programs to take care of environmental circumstances inside the envelope of temperature between 65°F and 85°F (18°C and 30°C) and relative humidity between 13 and 80%. As water and vitality are consumed within the conditioning of this air, optimizing the quantity of provide airflow that needs to be conditioned is a excessive precedence when it comes to enhancing operational effectivity.

Since 2021, we have now been leveraging AI to optimize the amount of airflow supply into data centers for cooling purposes. Utilizing simulator-based reinforcement studying, we have now, on common, diminished the provision fan vitality consumption at one of many pilot areas by 20% and water utilization by 4% throughout varied climate circumstances.

Beforehand, we shared how a physics-based thermal simulator helps us optimize our data centers’ environmental controls. Now, we’ll shed extra mild on the position of reinforcement studying within the answer. As Meta is revamping its new information heart design to optimize for synthetic intelligence, the identical methodology will likely be relevant for future information heart optimizations as properly to enhance operational effectivity.

Presently, Meta’s information facilities undertake a two-tiered penthouse design that makes use of 100% outdoors air for cooling. As proven in Determine 1, the air enters the power via louvers on the second-floor “penthouse,” with modulating dampers regulating the amount of outdoor air. The air passes via a mixing room, the place outside air, if too chilly, may be combined with warmth from server exhaust when wanted to control the temperature.

The air then passes via a collection of air filters and a misting chamber the place the evaporative cooling and humidification (ECH) system is used to additional management the temperature and humidity. The air continues via a fan wall that pushes the air via openings within the flooring that function an air shaft main into the server space on the primary flooring. The new air popping out from the server exhaust will likely be contained within the sizzling aisle, via exhaust shafts, and ultimately launched out of the constructing with the assistance of aid followers.

Water is principally utilized in two methods: evaporative cooling and humidification. The evaporative cooling system converts water into vapor to decrease the temperature when the skin air is simply too sizzling, whereas the humidification course of maintains the humidity stage if the air is simply too dry. On account of this design, we imagine Meta’s data centers are among the most advanced, energy and water efficient data centers in the world.

Determine 1: The penthouse cooling system inside Meta’s information facilities.

As a way to provide air inside the outlined working envelope, the penthouse depends on the constructing administration system (BMS) to observe and management completely different parts of the mechanical system. This method performs the duty of conditioning the consumption air from outdoors by mixing, humidifying/dehumidifying, evaporative cooling, or a mix of those operations.

There are three main management loops chargeable for adjusting setpoints for provide air: temperature, humidity, and airflow. The airflow setpoint is often calculated primarily based on a small set of enter variables like present IT load, chilly aisle temperature, and differential stress between the chilly aisle and sizzling aisle. The logic is usually quite simple at a linear scale, however turns into very troublesome to precisely mannequin as these values at completely different areas within the information heart are coupled to at least one one other and extremely depending on advanced native boundary circumstances. Nonetheless, the quantity of airflow will largely dictate the vitality utilized by the provision fan arrays and water consumption when cooling or humidification is required. Due to this fact, optimizing the airflow setpoint would have the best impression with regard to additional enhancing the cooling effectivity given the truth that the temperature and humidity boundary of the working envelope is mounted.

Reinforcement studying (RL) is sweet at modeling management programs as sequential state machines. It capabilities as a software program agent that determines what motion to take at every state primarily based on some transition mannequin – which results in a unique state – and always will get suggestions from the surroundings when it comes to reward. Ultimately, the agent learns the very best coverage mannequin (usually parameterized by a deep neural community) to attain the optimum amassed reward. The information heart cooling management may be naturally modeled beneath this paradigm.

At any given time, the state of a knowledge heart may be represented by a set of environmental variables monitored by many various sensors for out of doors air, provide air, chilly aisle and sizzling aisle, plus IT load (i.e., energy consumption by servers), and so on. The motion is to manage setpoints – for instance, the provision airflow setpoint that determines how briskly the provision followers run to fulfill the demand. The coverage is the perform mapping from the state house to motion house (i.e., figuring out the suitable airflow setpoint primarily based on present state circumstances). Now the duty is to leverage historic information we have now collected from hundreds of sensors in our information facilities – augmented with simulated information of potential, however not but skilled circumstances – and practice a greater coverage mannequin that offers us higher reward when it comes to vitality or water utilization effectivity.

The thought of utilizing AI for information heart cooling optimization is not new. There are additionally varied RL approaches reported resembling, transforming cooling optimization via deep reinforcement learning and data center cooling using model-predictive control.

Nonetheless, making use of the management coverage decided by a web based RL mannequin might lead to varied dangers together with breaches of service necessities and even thermal unsafety. To deal with this problem, we adopted an offline simulator primarily based RL method. As illustrated in Determine 2, our RL agent operates in a simulated surroundings by ranging from real-life historic observations, S. It then explores the motion house, feeding into the simulator to foretell the anticipated new state S’ and reward, R, given every sampled motion, A. From there it collects the pairs (S, A) which have the very best reward to kind a brand new coaching information set to replace the parameterized coverage mannequin.

Determine 2: Our simulated-based offline RL method.

Our simulator is a physics-based model of building energy use that takes as inputs time collection resembling climate information, IT load, and setpoint schedules. The mannequin is constructed with information heart constructing parameters, together with geometry, development supplies, HVAC, system configurations, element efficiencies, and management methods. It makes use of differential equations to output the dynamic system response, such because the thermal load and ensuing vitality use, together with associated metrics like chilly aisle temperature and differential stress profiles.

The simulator performs an important position right here since our purpose is to optimize vitality and water utilization whereas preserving the info heart situation beneath specs so {hardware} efficiency isn’t affected. Extra particularly, we need to maintain the rise in chilly aisle temperature beneath a sure threshold, or a optimistic pressurization from chilly aisle to sizzling aisle, to reduce the parasitic warmth brought on by recirculation.

Moreover, the physics-based simulator allows us to coach the RL mannequin with all potential eventualities, not solely these current within the historic information. This will increase reliability throughout outlier occasions and permits for fast deployment in newly commissioned information facilities.

In 2021, we began a pilot at one in every of Meta’s information heart areas – having the RL mannequin instantly controlling the provision airflow setpoint. Determine 3 exhibits a comparability of the brand new setpoint, within the unit of cubic toes per minute (CFM) because the purple line to the unique BMS setpoint (because the dotted blue line) over one week’s period for illustration functions.

Determine 3: A comparability of the RL mannequin versus the unique BMS setpoint.

The fluctuation is principally decided by the provision air temperature and server load cycles at completely different occasions of day. Extra importantly, as proven in Determine 4, the info heart temperature circumstances by no means went out of spec, with diminished airflow provide with respect to each chilly aisle common and most temperature in contrast in opposition to the provision air temperature.

Determine 4: A knowledge heart temperature profile beneath RL mannequin management.

It’s noticeable that the CFM financial savings fluctuate beneath completely different provide air temperatures because the univariate chart in Determine 5 exhibits. The CFM financial savings can simply be transformed to vitality financial savings utilized by the provision followers. Underneath sizzling and dry circumstances, when evaporative cooling or humidification is required, utilizing much less air will lead to much less water utilization as properly. Over the previous couple years of the pilot, on common, we had been capable of cut back the provision fan vitality consumption by 20% and water utilization by 4% throughout varied climate circumstances.

Determine 5. A breakdown of airflow financial savings at completely different provide air temperatures.

This effort has opened the door to remodel how our information facilities function. By introducing automated predictions and steady optimizations for tuning surroundings circumstances in our information facilities we will bend the price curve and cut back effort on labor intensive duties.

Meta is breaking floor on new varieties of information facilities which can be designed to optimize for artificial intelligence. We plan to use the identical methodology offered right here to our future information facilities on the design section to assist guarantee they’re optimized for sustainability from day one in every of their operations.

We’re additionally at present rolling out our RL method to information heart cooling to our current information facilities. Over the couple of years we count on to attain important vitality and water utilization financial savings to contribute to Meta’s long- term sustainability goals.

We wish to thank our companions in IDC Facility Operations (Butch Howard, Randy Ridgway, James Monahan, Jose Montes, Larame Cummings, Gerson Arteaga Ramirez, John Fabian, and lots of others) for his or her assist.