What is Azure HPC? – Microsoft Community Hub by info.odysseyx@gmail.com August 22, 2024 written by info.odysseyx@gmail.com August 22, 2024 0 comment 3 views 3 Our overall mission has been to democratize access to supercomputing. We have always believed that people can do great things with access to high-performance computing. So while hosting the AI revolution was never a plan, its very existence has proven our strategy of sitting at the intersection of two movements in IT: cloud computing and Beowulf clusters, and solving the biggest obstacle to supercomputing accessibility. When it comes to resource allocation, the cloud is all about adapting to customer demand as efficiently as possible. Whether you’re a startup that starts small but needs to scale smoothly as your business grows, or a retailer that needs more capacity for the holiday rush but doesn’t want to pay for it all year long, cloud providers manage these massive changes over time and across customers through virtualization. Instead of installing new hardware for each deployment, the cloud takes powerful servers and runs hypervisor software that can split them into smaller virtual machines on the fly. This allows these virtual machines to be created, moved between servers, and destroyed in seconds. The cloud can be a differentiator by providing customers with virtually unlimited flexibility in these operations. Since NASA researchers built the Beowulf supercomputer by linking multiple PCs together via Ethernet in 1994, clustering commodity hardware has become an increasingly popular way to build supercomputers. The main driver of this change is economies of scale: custom designs tend to use each processor more efficiently, but if you can buy a Raspberry Pi for half the price per FLOPS, just getting to 60% efficiency is a win. In this context, the efficiency of a system is the ratio of how fast it can do useful work compared to the sum of FLOPS across all its parts. Beowulf clusters tend to underperform on this metric, mainly because their custom silicon is more optimized for the target workload, and because network bottlenecks often cause Beowulf nodes to get stuck waiting for data. As Beowulf-style clusters gained popularity, specialized accelerators and networking technologies emerged to increase efficiency. Accelerators such as GPUs were targeted at specific types of computation, while new network protocols such as InfiniBand/RDMA were optimized for the static, closed nature of these “backend” networks and the communication patterns they conveyed. In addition to generally lowering costs, Beowulf clusters are modular enough that the same virtualization playbook can be re-run on cloud computing. Instead of using a hypervisor to partition a set of cores, we use network partitions to partition a set of nodes. For customers, this means creating a group of “RDMA capable” VMs with specific settings and instructing them to place these VMs on the same network partition. This has enabled us to deliver supercomputing in the cloud, which presents exciting opportunities for our customers. On your own cluster, you might have to pay twice as much for a larger cluster to complete the job in half the time. In the cloud, you only rent half the larger cluster, so the cost stays the same. In a sense, it’s like a crazy holiday rush, and in an industry where R&D relies heavily on supercomputing and time to market is everything, Azure HPC has become popular. By 2022, that story could have ended, but with the rise of AI, we are now serving industries that use supercomputing not only for R&D (training) but also for production (inference), making it an attractive platform for this generation of startups. In a nutshell, how Azure HPC works is that it provides customers, both small and large, with the latest server and accelerator technologies, connected by high-performance backend interconnects, and delivered through the Azure Cloud. I thought I’d write this because inside Azure, we often refer to ourselves as “the InfiniBand team” or “Azure GPU” or “the AI platform”. I know that all three have had tremendous success and will continue to do so in the future for network, accelerator, and use cases, but if any one of them were to change, it would still be Azure HPC. Source link Share 0 FacebookTwitterPinterestEmail info.odysseyx@gmail.com previous post Just a moment… next post Just a moment… You may also like Insights from MVPs at the Power Platform Community Conference October 10, 2024 Restoring an MS SQL 2022 DB from a ANF SnapShot October 10, 2024 Your guide to Intune at Microsoft Ignite 2024 October 10, 2024 Partner Blog | Build your team’s AI expertise with upcoming Microsoft partner skilling opportunities October 10, 2024 Attend Microsoft Ignite from anywhere in the world! October 10, 2024 Get tailored support with the new Partner Center AI assistant (preview) October 10, 2024 Leave a Comment Cancel Reply Save my name, email, and website in this browser for the next time I comment.