>
AMD wants to reduce communication costs and is on a crusade to significantly reduce the cost of transferring bits between memory and computer by RAM on top of the CPU/GPU.
Dr. Lisa Su, CEO of the company, recently gave a high-level presentation at the International Solid-State Circuits Conference (ISSCC) 2023, speaking at length about the need to increase the amount of energy (expressed in Joules) per computation operations (FLOPS). to decrease.
Otherwise (as she puts it) the next Zettaflop-compatible supercomputer needs a nuclear power plant to keep running – and that’s not a realistic or sustainable thing.
Distance
Instead, the biggest improvements in performance per watt, Su believes, will be achieved by reducing the physical distance between memory and where computation takes place (on the CPU or GPU). She used the example of the MI300 accelerator which uses a next generation AMD Instinct APU with unified high-bandwidth memory (HBM) to achieve significant energy savings.
At the same time, AMD has already integrated processing-in-memory to reduce the energy required to access data.
Su’s presentation stated, “Important algorithmic kernels can run directly in memory, saving precious communications energy” – and for that, AMD is teaming up with Samsung Electronics, whose expertise in DRAM is undeniable.
Closer is better
Memory-on-chip is already mainstream: AMD is packing it in AMD Ryzen 9 7950X3D and be up for it Ryzen 7 5800X3D (note that this memory is the faster and more expensive SRAM instead of DRAM). HBM is present in AMD’s Instinct MI accelerators and in Nvidia’s popular A100 accelerator, the brain behind it ChatGPT. Apple’s M series used HBM connected to the processor, but on the packaging instead of on the chip.
Eventually, HPC will move toward full-scale memory-on-chip, as it is the simplest low-hanging fruit as workloads that require extremely large amounts of high-bandwidth push addressing power requirements (and associated costs) higher on the priority list.
Fujitsu’s A64FX processor, launched in 2019, is a true trailblazer and trailblazer, merging dozens of Arm cores with 32 GB HBM2 memory on top and offering a whopping 1 TBps of bandwidth and with HBM3 already available on Hopper H100 from Nvidia enterprise GPU, it gets even more interesting. Rambus plans to go beyond the HBM3 specs and hinted, last Aprilwith a bandwidth of up to 1.05 TBps.
Increased interest in HBM, the cloud of the 1-ton gorilla that is Apple and the never-ending quest for bandwidth without needing an exotic power supply (and equally exotic cooling system) means that HBM will – in the long run – probably supplant DIMM (and GDDR) as the main memory format: blame Apple.
Dr. Su expects the first Zettascale supercomputer to be unveiled before 2035: that gives us 12 years to find the perfect solution unless AI arrives there first.