Apple embraces Nvidia GPUs to accelerate LLM inference via its open source ReDrafter technology


  • ReDrafter delivers 2.7x more tokens per second compared to traditional auto-regression
  • ReDrafter can reduce latency for users while using fewer GPUs
  • Apple has not said when ReDrafter will be deployed on competing AI GPUs from AMD and Intel

Apple has announced a partnership with Nvidia to accelerate the inference of large language models using the open source technology Recurrent Drafter (or ReDrafter for short).

The partnership aims to address the computational challenges of automatic regressive token generation, which is critical for improving efficiency and reducing latency in real-time LLM applications.