IBM has built a cost-effective AI supercomputer in its cloud
>
IBM’s answer to the cost-effective supercomputer has been around for several months, but only recently released tangible information about its so-called Vela project.
Turning to his blogging (opens in new tab) to discuss details, IBM revealed that the research, written by five of the company’s employees, addresses the issues with previous supercomputers and their lack of readiness for AI tasks.
To adapt the supercomputing model for this future type of workload, the company sheds some light on the decisions it has made regarding the use of affordable yet powerful hardware.
IBM’s Vela AI supercomputer
The work emphasizes that “building a [traditional] supercomputer meant bare metal nodes, powerful networking hardware… parallel file systems and other items commonly associated with high-performance computing (HPC).
While it is clear that these supercomputers can handle heavy AI workloads, including the one designed for OpenAI, the start-up behind the popular ChatGPT live chat software, a lack of optimization has left traditional supercomputers lacking valuable power and a surplus in other areas leads to unnecessary expenditure.
While it has long been believed that bare metal nodes are most ideal for AI, IBM wanted to explore whether they could be offered in a virtual machine (VM). The result, according to Big Blue, is a huge performance gain.
“After a significant amount of research and discovery, we came up with a way to expose all the capabilities of the node (GPUs, CPUs, networking and storage) to the VM so that the virtualization overhead is less than 5%, which is the lowest overhead in the industry we know.”
As for the node design, Vela comes packed with 80 GB or GPU memory, 1.5 TB DRAM and four 3.2 TB NVMe storage drives.
The next platform (opens in new tab) estimates that if IBM were to include its supercomputer in the Top500 ranking, it would deliver about 27.9 petaflops of performance, placing it at 15th in the November 2022 ranking.
While today’s supercomputers are currently capable of handling AI workloads, massive advances in artificial intelligence coupled with the pressing need for cost-efficiency emphasize the need for such a machine.