MemVerge, a provider of software designed to accelerate and optimize data-intensive applications, is partnering with Micron to improve the performance of LLMs using Compute Express Link (CXL) technology.
The company’s Memory Machine software uses CXL to reduce idle time in GPUs caused by memory load.
The technology was demonstrated at Micron’s booth at Nvidia GTC 2024 and Charles Fan, CEO and co-founder of MemVerge said: “Cost-effectively scaling LLM performance means fueling the GPUs with data. Our demo at GTC shows that pools of tiered memory not only increase performance, but also maximize the use of precious GPU resources.”
Impressive results
The demo used a high-throughput FlexGen generation engine and an OPT-66B large language model. This was performed on a Supermicro Petascale Server equipped with an AMD Genoa CPU, Nvidia A10 GPU, Micron DDR5-4800 DIMMs, CZ120 CXL memory modules, and MemVerge Memory Machine X intelligent tiering software.
The demo contrasted the performance of a job running on an A10 GPU with 24 GB of GDDR6 memory and data fed by 8x 32 GB Micron DRAM, with the same job running on the Supermicro server equipped with a Micron CZ120 CXL 24 GB memory expander and the MemVerge software.
The FlexGen benchmark, which uses tiered memory, completed tasks in less than half the time of traditional NVMe storage methods. Additionally, GPU utilization increased from 51.8% to 91.8%, reportedly as a result of MemVerge Memory Machine X software’s transparent dating across GPU, CPU, and CXL memory.
Raj Narasimhan, senior vice president and general manager of Micron’s Compute and Networking Business Unit, said: “Our collaboration with MemVerge allows Micron to demonstrate the substantial benefits of CXL memory modules to improve effective GPU throughput for AI applications, which results in faster time-to-load. insights for customers. Micron’s innovations across its memory portfolio provide compute with the necessary memory capacity and bandwidth to scale AI use cases from the cloud to the edge.”
However, experts remain skeptical about the claims. Blocks and files pointed out that the Nvidia A10 GPU uses GDDR6 memory, which is not HBM. A spokesperson for MemVerge responded to this point, and others the site raised, saying: “Our solution has the same effect on the other GPUs with HBM. Between the memory offloading capabilities of Flexgen and the memory tiering capabilities of Memory Machine