“How do these ‘sniff your coffee’ numbers arise?”: Expert questions the validity of Zettascale and Exascale class AI supercomputers, presents a simple convincing car analogy to explain that not all FLOPs are the same
A leading expert has raised critical questions about the validity of claims surrounding “Zettascale” and “Exascale class” AI supercomputers.
In an article that delves deeply into the technical complexities of these terms, Doug Eadline explains HPCThread explains how terms like exascale, which traditionally refer to computers that perform one trillion floating point operations per second (FLOPS), are often misused or misrepresented, especially in the context of AI workloads.
Eadline points out that many of the recent announcements touting exascale or even zettascale performance are based on speculative statistics, rather than tested results. He writes, “How do these ‘sniff your coffee’ numbers come from unbuilt systems?” – a question that highlights the gap between theoretical peak performance and actual measured results in the field of high-performance computing. The term exascale has traditionally been reserved for systems that achieve at least 10^18 FLOPS in sustained double-precision (64-bit) computation, a standard verified by benchmarks such as the High-Performance LINPACK (HPLinpack).
Car comparison
As Eadline explains, the distinction between FLOPS in AI and HPC is crucial. While AI workloads often rely on lower precision floating point formats such as FP16, FP8 or even FP4, traditional HPC systems require higher precision for accurate results.
Using these numbers with lower accuracy leads to exaggerated claims of exaFLOP or even zettaFLOP performance. According to Eadline, calling it “AI zetaFLOPS” is foolish because no AI was running on this unfinished machine.
He further emphasizes the importance of using verified benchmarks such as HPLinpack, which has been the standard for measuring HPC performance since 1993, and how using theoretical peak numbers can be misleading.
The two supercomputers currently in the exascale club – Frontier at Oak Ridge National Laboratory and Aurora at Argonne National Laboratory – have been tested with real applications, unlike many of the AI systems that make exascale claims.
To explain the difference between various floating point formats, Eadline offers a car analogy: “The average FP64 dual-precision car weighs approximately 4,000 pounds (1,814 kilos). It is great at navigating terrain, seating four people comfortably and gets 30 MPG. Now consider the FP4 car, which is reduced to 250 pounds (113 kilos) and gets an astonishing 480 MPG Great news, except you don’t mention a few features of your fantastic car FP4-. car. First of all, the car is stripped of everything except a small engine and maybe a seat. Additionally, the wheels are 16-sided (2^4) and make for a bumpy ride compared to the smooth ride of the FP64 sedan. wheels with somewhere around 2^64 sides. There may be places where your FP4 car works fine, like cruising down Inference Lane, but on the FP64 HPC highway it won’t do well.
Eadline’s article reminds us that while AI and HPC are moving closer, the standards for measuring performance in these areas remain different. As he puts it, “Solving things with ‘AI FLOPS’ won’t help either,” noting that only verified systems that meet the stringent requirements for double-precision calculations should be considered true exascale or zettascale systems.