No one really expected Nvidia to release something like the GB10. After all, why would a tech company that transformed itself into the most valuable company ever by selling parts that cost hundreds of thousands of dollars suddenly decide to sell an entire system for a fraction of the price?
I believe Nvidia wants to revolutionize computing like IBM did almost 45 years ago with the original IBM PC.
As a reminder, Project DIGITS is a fully formed, turnkey supercomputer built into something the size of a mini PC. It’s essentially a smaller version of the DGX-1, the first of its kind launched almost a decade ago in April 2016. It then sold for $129,000 with a 16-core Intel Xeon CPU and eight P100 GPGPU cards; Figures cost $3,000.
Nvidia has confirmed that it has an AI performance of 1,000 Teraflops with FP4 precision (dense/sparse?). While there is no direct comparison, one can estimate that the little supercomputer has about half the processing power of a fully loaded, 8-card Pascal-based DGX-1.
At the heart of Digits is the GB10 SoC, which has 20 Arm Cores (10 Arm Cortex-X925 and 10 Cortex-A725). Apart from the confirmed presence of a Blackwell GPU (a lite version of the B100), one can only estimate the power consumption (100W) and the bandwidth (825GB/s according to The registry).
You should be able to connect two of these devices (but no more) via Nvidia’s proprietary ConnectX technology to handle larger LLMs like Meta’s Llama 3.1 405B. Fitting these little mini PCs into a 42U rack seems virtually impossible for now, as this would encroach on Nvidia’s much more lucrative DGX GB200 systems.
Everything about the canal
Why did Nvidia start Project DIGITS? I think it’s about strengthening the moat. Making your products so sticky that it becomes almost impossible to compete is something that worked very well for others: Microsoft and Windows, Google and Gmail, Apple and the iPhone.
The same thing happened with Nvidia and CUDA: by being in the driver’s seat, Nvidia could do things like shake the goalposts and upset the competition.
The move to FP4 for inference allowed Nvidia to deliver impressive benchmark claims such as “Blackwell delivers 2.5x the performance of its predecessor in FP8 for training, per chip, and 5x with FP4 for inference”. Of course, AMD isn’t offering FP4 computation in the MI300X/325X series and we’ll have to wait until later this year before it rolls out in the Instinct MI350X/355X.
Nvidia is therefore laying the groundwork for future incursions, for lack of a better word or analogy, from existing and future competitors, including its own customers (think Microsoft and Google). The ambition of Nvidia CEO Jensen Huang is clear; he wants to extend the company’s dominance beyond the domain of hyperscalers.
“AI will be mainstream in every application and in every sector. Project DIGITS brings the Grace Blackwell Superchip to millions of developers, putting an AI supercomputer on the desks of every data scientist, AI researcher and student, empowering them to engage and shape the age of AI.” Huang recently commented.
Short of renaming Nvidia as Nvid-ai, this is as close as Huang gets to acknowledging his ambitions to make his company’s name synonymous with AI, much like Tarmac and Hoover before them (albeit in more niche industries) .
I was also, like many, perplexed by the Mediatek link and the reason for this link can be found in the Mediatek press release. The Taiwanese company “brings its design expertise in Arm-based SoC performance and energy efficiency to (a) breakthrough device for AI researchers and developers,” it noted.
I believe the partnership benefits Mediatek more than Nvidia and in the short term I see Nvidia quietly going solo. Reuters Huang rejected the idea of Nvidia going after AMD and Intel, saying: “Now they (Mediatek) could offer that to us, and they could keep that for themselves and serve the market. And so it was a great win-win.”
However, this doesn’t mean that Nvidia won’t provide more mainstream products, but that they would be aimed at businesses and professionals, rather than consumers where cutthroat competition makes things more challenging (and margins razor thin).
The Reuters article quotes Huang as saying, “We’re going to make it a mainstream product, we’re going to support it with all the things we do to support professional and high-end software, and the PC (manufacturers) are going to make it available to end users.”
Header cell – Column 0 | FIGURES | FIGURES 2.4X | DGX-1 v1 | Variance (DGX vs NUMBERS) |
---|---|---|---|---|
Depth (estimated) in mm | 89 | 89 | 866 | 9.73x |
Width (estimated) in mm | 135 | 324 | 444 | 1.37x |
Height (estimated) in mm | 40 | 40 | 131 | 3.28x |
Weight in kg | ~1 | ~2.4 | 60.8 | 25.35x |
Price USD (adjusted November 2024) | 3000 | 7200 | 170100 | 23.63x |
Performance GPU FP16 (TF) | 0 | 0 | 170 | Row 5 – Cell 4 |
Performance GPU FP16 Closed (TF) | ~282 | 676.8 | 680 | 1.00x |
Performance GPU FP4 Closed (TF) | 1000 | Row 7 – Cell 2 | 0 | Row 7 – Cell 4 |
GPU memory (GB) | 128 | 307.2 | 128 | 0.42x |
Maximum power consumption (W) | ~150 | ~300 | 3200 | 10.67x |
Storage (TB) | 4 | 9.6 | 7.68 | 0.80x |
GPU family | Blackwell | Blackwell | Pascal | Row 11 – Cell 4 |
GPU power consumption (W) x8 | ~100 | ~240 | 2400 | 10x |
Number of GPU transistors (bn) x8 | ~30 | ~72 | 120 | 1.67x |
Memory bandwidth (GB/sec) x | ~850 | ~850 | 720 | 0.85x |
Staring into my crystal ball
One theory I came across while researching this feature is that more and more data scientists are embracing Apple’s Mac platform because it offers a balanced approach. Good performance – thanks to the uniform memory architecture – for a ‘reasonable’ price. The Mac Studio with 128 GB of unified memory and 4 TB SSD is currently on sale for $5,799.
So where does Nvidia go from there? An obvious step would be to integrate the memory into the SoC, similar to what Apple has done with its M-series SoC (and AMD with its HBM-powered Epyc). This would not only save costs, but also improve performance, something its bigger brother, the GB200, already does.
Then it will depend on whether Nvidia wants to offer more at the same price or the same performance at a lower price (or a bit of both). Nvidia could follow Intel’s lead and use the GB10 as a prototype to encourage other key partners (PNY, Gigabyte, Asus) to launch similar projects (Intel did that with the Next Unit of Computing or NUC).
I am also extremely interested to know what will happen to the Jetson Orin family; the NX 16GB version was upgraded just a few weeks ago to offer 157 TOPS in INT8 performance. This platform is destined to fulfill more DIY/edge use cases rather than pure training/inference tasks, but I can’t help but think about “What if” scenarios.
Nvidia is clearly disrupting itself before others try; the question is how far this will go.