‘Our models are preferred by human reviewers’: How Apple’s entry-level models outperform established rivals – On-device or server responses indicate Apple is already competitive
Perhaps the highlight of this year’s WWDC, Apple Intelligence is tightly integrated into iOS 18, iPadOS 18, and macOS Sequoia, and includes advanced generative models specialized for everyday tasks like writing, text refinement, notification summarization, creating of images and automating app interactions.
The system includes a 3 billion-parameter on-device language model and a larger server-based model running on Apple silicon servers via Private Cloud Compute (PCC). Apple says these core models, along with a coding model for Xcode and a diffusion model for visual expression, support a wide range of user and developer needs.
The company also adheres to Responsible AI principles, ensuring tools empower users, represent diverse communities, and protect privacy through on-device processing and secure PCC. Apple says its models are trained on licensed and publicly available data, with filters to remove personal information and low-quality content. The company employs a hybrid data strategy, combining human-annotated and synthetic data, and uses new algorithms for post-training improvements.
Human graders
For inference performance, Apple says it has optimized its models with techniques such as clustered query attention, low-bit palletization, and dynamic adapters. On-device models use a vocabulary size of 49 KB, while server models use 100 KB and support additional languages and technical tokens. According to Apple, the on-device model achieves a generation rate of 30 tokens per second, with further improvements through token speculation.
Adapters, which are small neural network modules, tune models for specific tasks, maintain basic model parameters, and specialize for targeted functions. These adapters are loaded dynamically, ensuring efficient memory usage and responsiveness.
Security and helpfulness are paramount at Apple Intelligence, the Cupertino-based tech giant emphasizes, and the company evaluates its models through human review, focusing on real-world clues across categories. The company claims its on-device model outperforms larger competitors such as Phi-3-mini and Mistral-7B, while its server model rivals DBRX-Instruct and GPT-3.5-Turbo. This competitive advantage is highlighted by Apple’s claim that human evaluators favor its models over established rivals in several benchmarks, some of which can be viewed below.