Immersion Server Liquid Cooling: ZTE Makes a Splash at MWCby Ian Cutress on March 12, 2018 11:30 AM EST
Big data centers are often cooled by air, and large HVAC/air-conditioning machines. The ones near the Arctic Circle can rely on the outside air to help. If a center invests properly, especially with a specific design and layout in mind, then using water cooling is another investment that can be made. If a designer really wants to go off the deep end, then full immersion liquid cooling is a possibility.
Immersive liquid cooling is ultimately not that new, and is based on non-conductive liquids. It allows for the full system to be cooled: all of the components, all of the time, and removes the need for large cooling apparatus, and encourages energy recycling, which is a major metric for data center owners. For data centers limited by space, it also offers better density of server nodes in a confined space, ideal for deployments on the edge of communication networks.
There are two angles to immersion cooling: non-phase change, or phase change. The first one, non-phase change, involves using a liquid with a high heat capacity, and cycling through a heat exchange system. The downside of those liquids is that they often have a high viscosity (mineral oil), requiring a lot of energy to forcibly circulate. By contrast, the phase-change variety is, for most purposes, self-convecting.
The idea here is that the liquid being used changes from a liquid to a gas by the act of being warmed up by the component. The gas then rises up to a cool surface (like a cold radiator), condenses, and then falls, as it is now cooler again. The energy transferred into the radiator can then be circled into an energy recovery system. The low viscosity of the phase change material aids significantly in the convection, with the act of creating a large volume low density gas displacing the liquid for that convection.
The formation of the gas ultimately displaces liquid in contact with the hot surfaces, such as heatsinks, or as we'll discuss in a bit, bare processors. Forming a gas at the processor displaces the amount of liquid in contact with the heat spreader, restricting the overall cooling ability. Over the last 10 years, this phase-change immersion implementation has evolved, with liquids developed that have a suitably low viscosity but a good boiling point to be able to cool hardware easily in excess of 150. If you have ever seen us utter the words '3M Novec' or 'Fluorinert', these are the families of liquids we are taking about - low viscosity, medium sized organic molecules engineered with specific chemical groups or halogens to fit the properties needed, or combinations of liquids that can adjust to fit the mold needed. Bonus points for being completely non-toxic as well.
As mentioned, this is not a new concept. We have seen companies display this technology at events for years, but no matter when it happens, when a non-tech journalist writes about it, it seems to spread like wildfire. In the world of cool demonstrations at trade shows, this seems to fair better than liquid nitrogen overclocking. However, making it a commercial product is another thing entirely. We have seen GIGABYTE's server division demonstrate a customer layout back at Supercomputing 2015, and then the PEZY group showed a super-high dense implementation with their custom core designs at Supercomputing 2017, both showing what is capable with a tight cooperation. ZTE's demonstration at Mobile World Congress was specifically designed to show to potential customers its ability to offer dense computing with more efficient cooling methods, should anyone want to buy it.
A few things marked ZTE's demonstration a little different than those we have seen before. Much to my amazement, they wanted to talk about it! On display was a model using dual processor E5 v4 nodes, however next generation is using Xeon Scalable. I was told that due to the design, fiber network connections do not work properly when immersed: the distortion created by the liquid even when a cable is in place causes a higher than acceptable error rate, so most connections are copper which is not affected. I was told that they do not have a problem with the thermal capacity of the liquid, and supporting the next generation of CPUs would be no problem.
One of the marked problems with these immersion designs is cost - the liquid used ranges from $100-$300 per gallon. Admittedly the liquid, like the hardware, is a one-time purchase, but can also be recycled for new products when the system is updated. Our contact at ZTE mentioned that they are working with a chemical company in China to develop new liquids that have similar features but are a tenth of the cost. It was not known if those materials would be licensed and exclusive to ZTE however. As a chemist, I'd love to see the breakdown of these chemicals, also most of them remain proprietary. We did get a slight hint when GIGABYTE's demo a few years ago mentioned that the Novec 72DA it used is a solution of 70% 1,2-trans-dichloroethylene, 4-16% ethyl nonafluorobutyl ether, 4-6% ethyl nonafluoroisobutyl ether and trace other similar methyl variants.
One topic that came up was the processors. As noted in the images, the tops of the heatspreaders are copper colored, indicating that an engineer has taken sandpaper to rub off the markings. Normally with a heatspreader, the goal is for it to be as flat and perfect as possible, to provide the best contact through paste to the heatsink. With immersion cooling, the opposite is true: it needs to be as rough as possible. This creates a large surface area, and more importantly creates nucleation sites that allow the liquid to boil easier. This avoids cavitation boiling, caused when there is a limited surface, and the liquid boils a lot more violently.
Of course, the downside to an immersion setup is the ability to repair and upgrade. If possible, the owner does not want to have to go in and replace a part. It ends up messy and potentially damaging, or requires a full set of servers to be powered down. There is ultimately no way around this, and while the issue exists with standard data center water cooling, it is a more significant issue here. ZTE stated that this setup was aimed at edge computing, where systems might be embedded for five years or so. Assuming the components all last that long as well, five years is probably a good expectation for an upgrade cycle as well.
Post Your CommentPlease log in or sign up to comment.
View All Comments
YukaKun - Monday, March 12, 2018 - linkI think it would need a pool and some scuba-diving technicians.
Singing "Under the Sea" might or might not be necessary.
lilmoe - Monday, March 12, 2018 - linkMaybe a heat sink/block on top of the CPU would allow for even more heat dispersion?
jtd871 - Monday, March 12, 2018 - linkThe IHS is already larger than the die. Hope they're using good TIM for the IHS...
DanNeely - Monday, March 12, 2018 - linkIf they're boiling the liquid the surface needs to be hot enough to do so, a heat sink lowering the surface temps would be counter productive.
Santoval - Monday, March 12, 2018 - linkI highly doubt combining passive cooling with immersion cooling would provide any benefit. Perhaps the results would be worse, since you insert an unnecessary middleman that was also specifically designed to dissipate air, not liquids. Besides you would also defeat one of the main benefits of immersion cooling : the possibility of very compact/dense designs.
sor - Thursday, March 15, 2018 - linkAs long as the heat sink is more thermally conductive than the cooling liquid then it would absolutely benefit from a heat sink to spread and aid the transfer of heat. In fact, that’s the whole point of roughing up the lid, increase surface area and create micro fins/pins.
Now, it may be that soldering on a 1cm tall copper finned heatsink only improves things by a tiny fraction vs just roughing up the lid, or it might be that they actually do want to run the systems hot and boil the surface in order to create convection currents.
DanNeely - Monday, March 12, 2018 - linkFor cloud scale datacenters not being able to easily service individual servers might not be a major problem either. In some of their previous data centers (haven't seen anything about their most recent ones) MS was bringing in servers pre-assembled into shipping container or prefab building module sized lumps with the intent of just connecting power, data, and cooling to the modules at setup and then never opening them until they were due to be replaced wholesale. Any dead server modules would just be shut down at the administrative level, and when the total number of dead ones got high enough or if new generations of hardware got enough better the entire module would be pulled out as a unit and sent to the recycler.
Holliday75 - Monday, March 12, 2018 - linkI admit its been almost 3 years since I worked in a MS data center, but I've never seen it works like this. Are you talking about Dell Nucleon racks or HP/Dell containers?
Azure was using Dell Nucleon's when I left and they were fixed on the fly when blades went down/drives failed.
DanNeely - Monday, March 12, 2018 - linkI don't think any of the articles I read ever named the suppliers. I did find one article from 08 (longer ago than I thought) talking about the early shipping container data centers where the plan was to be hands off until the entire container was yanked.
"Once containers are up and running, Microsoft's system administrators may never go inside them again, even to do a simple hardware fix. Microsoft's research shows that 20% to 50% of system outages are caused by human error. So rather than attempt to fix malfunctioning servers, it's better to let them die off. "
"As more and more servers go bad inside the container, Microsoft plans to simply ship the entire container back to the supplier for a replacement. "
If MS decided in place repair was worth a larger on site staff since then, well that's why I'd noted not having seen anything about how they were running more recent centers. *shrug*
GeorgeH - Monday, March 12, 2018 - linkThat's the first time I've seen a windowed computer case I actually want.
The big headline here is them thinking they can get the fluid down to $10 a gallon, though - assuming they hit that price point with good performance characteristics immersion cooling could finally go mainstream.