Nvidia Blackwell and the future of data center cooling by info.odysseyx@gmail.com November 25, 2024 written by info.odysseyx@gmail.com November 25, 2024 0 comment 3 views 3 Nvidia faced scrutiny this month after some servers with 72 Blackwell processors overheated. The problem arose because some early OEM installations were not properly water-cooled, which Lenovo aggressively identified and mitigated with its Neptune Warm water-cooling solutions. As AI advances, we will need ever more dense, incredibly powerful AI processors, which suggests that air cooling in server rooms may become obsolete. Let’s talk about Blackwell, water cooling and why Lenovo’s Neptune solution stands out right now. We’ll close with my product of the week: Microsoft’s Windows 365 Link, which may be the missing link between PCs and terminals that could change desktop computing forever. Blackwell Blackwell is Nvidia’s premier, AI-focused GPU. When it was announced, it was so much that most thought it was practical that it almost seemed more like a pipe dream than a solution. But it works, and there’s nothing close to its class right now. However, it is massively dense in terms of technology and generates a lot of heat. Some argue that this is a potential environmental disaster. Don’t get me wrong, it will draw a lot of power and generate a lot of heat. But its performance is so high for the kind of loads you’d typically get with more conventional parts that it’s relatively economical to run. It’s like comparing a semi-truck with three trailers to a U-Haul van. Yes, the semi will get relatively poor gas mileage, but it will hold more cargo than 10 U-Haul vans and use far less gas than those 10 vans, making it more environmentally friendly. So is Blackwell. It is so far beyond its competition in terms of performance that its relatively high power consumption is less than what would otherwise be required for a competitive AI server. But Blackwell chips run hot, and most servers today are air-cooled. So, it’s no surprise that some Blackwell servers were configured with air cooling and overheating in a rack with 72 or more Blackwell processors. While 72 Blackwells in a rack is uncommon today, it will become more common as AI advances, as Nvidia is currently the king of AI. You can only go so far with air-cooled technology in terms of performance before moving to liquid cooling. Although Nvidia responded to this problem with a water-cooled rack specification Dell is using it nowLenovo was way ahead of the curve with its Neptune water-cooling solution. Lenovo Neptune Lenovo was the first to realize this, mainly because it is currently the market leader in its class when it comes to water cooling – a technology originally acquired from IBM, which has been working on water cooling for decades. What is important in water cooling is not just the technology but the knowledge of how to install it safely. Mixing water and high-amperage electronics can be a disaster if you don’t know what you’re doing. As a result of the IBM server acquisition, Lenovo has decades of water cooling experience it says. Neptune. Given Nvidia specified a water-cooled rack, what makes the Neptune better? The answer is experience. Most that will use Nvidia-specific solutions, including Nvidia, often don’t deploy water-cooling solutions. As a result, especially with this high-end Blackwell implementation, they will essentially learn on the job. It can be really dangerous if you mix water with high-amperage electronics. Water and electricity do not mix. A leak can not only fry an expensive part or even an entire rack, but if a person is present, it can fry them too, if the breakers are not set. In a raised floor environment, unless it is designed. With leaks in the mind, terrible things can happen. I observed this myself when I was at IBM decades ago, and it turns out they didn’t stress-test the water-cooling system for our massive (for the time) data center. The site lost a transformer that shuts down the water-cooling system, which was not pressure-tested for sudden shutdown. Pipes burst, and the data center turns into a dangerous swimming pool. Hundreds of millions of dollars worth of hardware was lost, and the building flooded, causing additional damage. With such experience, IBM became the leading OEM for safe water cooling, and Lenovo acquired that knowledge and experience when it purchased the IBM x86 server group. Now, Lenovo, along with IBM, knows how to make water cooling work best, which means that you can rest assured that a Lenovo Blackwell server won’t overheat or suddenly start leaking. Also, Lenovo’s expertise is in warm water cooling, a much safer and much less expensive way to cool servers than cold water cooling, which requires huge, inefficient evaporators or chillers. Implementing this technology is no trivial task. Unlike automobiles or PCs that are water-cooled, servers must have hot-swappable capabilities, which means you need exceptional and highly tested drip-free connections, proactive warnings, preventive maintenance schedules based on past knowledge of the components, and experience working with technicians. . This level of water cooling technology. Wrapping Up: The Future of Hot-Water-Cooled Data Centers Blackwell is the first of these incredibly powerful processors to hit the market as AI pushes the envelope, with Nvidia’s competitors pushing for something similar, suggesting that all servers will eventually have to cool hot water. This positions Lenovo nicely for a water-cooled future regardless of technology as Lenovo’s competitors try to catch up. One benefit that I expect technicians to have is to reduce data center noise. The amount of air you have to push through an air-cooled server is huge and makes today’s data centers a noise nightmare. As hot-water cooling moves more aggressively into the market, these data centers will become quieter, making them a much more pleasant place to work. It would make many of us very happy to work in them. Windows 365 link Image credit: Microsoft Ever since we replaced terminals with PCs, IT wants the terminal experience back. Terminals were like pre-smart TVs so you didn’t have to patch or upgrade the OS or deal with the “blue screen of death”. If the thing broke, it was pretty easy to fix or relatively cheap to replace. From an IT perspective, terminals were a ton better than PCs. But on the PC side, the terminal sucks. You can’t run what you want without getting IT support, and it can take months for IT to respond to a request. Terminals were connected to aging mainframes that could not run modern applications at the time (they can now). New applications were usually custom-built, but a communication gap between users and IT often led to problems. Users struggled to articulate their requirements, and IT often failed to search for better specifications, often resulting in unusable applications. Last week at Microsoft Ignite, Microsoft announced Windows 365 link Which may be the closest thing to a fully wired (no laptop solution yet) terminal with PC-like features and performance. While we call the class a thin client, Microsoft calls it a cloud PC. At $349 and the size of a micro-PC, it seems the closest we’ve seen to a near-perfect PC/terminal mix. Windows 365 Link will be more reliable, cheaper, more secure, and much smaller than most desktop PCs, making it very attractive to IT. At the same time, it connects to a cloud PC instance, giving the user a very PC-like experience. It now only targets enterprise accounts, mainly because they have the most needs and infrastructure I see it moving into markets like travel, education, government, manufacturing, and other vertical markets with similar needs. Although it has not yet addressed mobile users, the fully deployed 5G and upcoming 6G specifications will allow for future mobile implementations. As Microsoft is the company that launched the PC and made terminals obsolete, it seems ironic — and poetic — that Microsoft eventually led the way in making them obsolete. We will see if that happens. For now, the Windows 365 link is my Product of the Week. Share 0 FacebookTwitterPinterestEmail info.odysseyx@gmail.com previous post Enterprise productivity is the easiest AI sell next post Nvidia unveils the ‘Swiss Army Knife’ of AI audio tools: Fugato You may also like Aptera Motors will showcase its solar EV at CES 2025 December 3, 2024 How Chromebook tools strengthen school cybersecurity December 2, 2024 Nvidia unveils the ‘Swiss Army Knife’ of AI audio tools: Fugato November 26, 2024 Enterprise productivity is the easiest AI sell November 20, 2024 Delivers data through IBM’s new Environmental Intelligence API November 19, 2024 Criticism mounts over old risk management frameworks November 19, 2024 Leave a Comment Cancel Reply Save my name, email, and website in this browser for the next time I comment.