Nvidia’s Biggest Rival Destroys Cloud Giants in AI Performance Once Again; Cerebras Inference is 75x faster than AWS, 32x faster than Google on Llama 3.1 405B


  • Cerebras achieves 969 tokens/second on Llama 3.1 405B, 75x faster than AWS
  • Claims industry-low latency of 240ms, twice as fast as Google Vertex
  • Cerebras Inference runs on the CS-3 with the WSE-3 AI processor

Cerebras Systems says it has set a new benchmark in AI performance with Meta’s Llama 3.1 405B model, achieving an unprecedented generation rate of 969 tokens per second.

Third-party benchmarking company Artificial Analysis has claimed that this performance is up to 75 times faster than GPU-based offerings from major hyperscalers. It was almost six times faster than SambaNova at 164 tokens per second, more than 14 times faster than Google Vertex at 30 tokens per second, and far surpassed Azure at just 20 tokens per second and AWS at 13 tokens per second.