Microsoft backed a small hardware startup that just launched its first AI processor that does inference without a GPU or expensive HBM memory, and a key Nvidia partner is working with it
- Microsoft-backed startup introduces GPU-free alternatives to generative AI
- DIMC architecture delivers ultra-high memory bandwidth of 150 TB/s
- Corsair supports transformers, agentic AI and interactive video generation
d-Matrix Inc., a hardware startup based in Santa Clara, California, has introduced its first AI processor, Corsair, which aims to improve AI inference.
Backed by Microsoft and using cutting-edge technology, Corsair eschews traditional GPUs and expensive high-bandwidth memory (HBM), delivering significant performance and cost benefits.
Corsair is currently available to early access customers, with wider availability planned for the second quarter of 2025.
Corsair’s achievements redefine AI inference
The Corsair processor is purpose-built to handle demanding AI inference tasks, especially for generative AI models. For example, it achieves 60,000 tokens per second at 1 ms per token when running Llama3 8B on a single server.
In more resource-intensive scenarios, such as with Llama3 70B models, Corsair delivers 30,000 tokens per second at a rate of 2 ms per token in a single rack, translating to significant energy and operational cost savings compared to traditional GPU-based solutions.
The processor is built on Nighthawk and Jayhawk II tiles, using a 6nm manufacturing process. Each Nighthawk Tile integrates four neural cores and a RISC-V CPU, tailored to support large model inference with digital in-memory computation (DIMC) and versatile data type processing, including block floating point (BFP).
Corsair uses chiplet packaging, integrating memory and computing power to maximize efficiency. It conforms to the industry standard full-height PCIe Gen5 card form factor and can be combined with DMX Bridge cards for scalable performance. Each card is powered by 2400 TFLOPs of 8-bit peak computing, along with 2 GB of integrated performance memory and up to 256 GB of off-chip memory capacity.
It is important to note that Micron Technology, a key partner of Nvidia, is also working with d-Matrix.
Initially planned for launch in late 2023, d-Matrix has reconfigured its architecture in response to rising demand for generative AI. This pivot allowed Corsair to integrate enhancements tailored to transformer models and emerging applications such as agentic AI and interactive video generation.
“We saw transformers and generative AI coming and founded d-Matrix to tackle inference challenges around the largest computing capabilities of our time,” said Sid Sheth, co-founder and CEO of d-Matrix.
“The first-of-its-kind Corsair computing platform delivers lightning-fast token generation for high interactivity applications with multiple users, making Gen AI commercially viable,” Sheth added.
Via eeNews