Meta unveiled the first generation of its own AI inference accelerator in 2023, which will power the ranking and recommendation models that are key parts of Facebook and Instagram.
The Meta Training and Inference Accelerator (MTIA) chip, which can handle inference but not training, was updated in April, doubling the compute and memory bandwidth of the first solution.
At the recent Hot Chips symposium last month, Meta presented on its next-generation MTIA and admitted that using GPUs for recommendation engines is not without its challenges. The social media giant noted that peak performance does not always translate to effective performance, that large deployments can be resource-intensive, and that capacity constraints are exacerbated by the growing demand for generative AI.
Mysterious memory expansion
With this in mind, Meta’s development goals for the next generation of MTIA include improving performance per TCO and per watt compared to the previous generation, efficiently processing models across multiple Meta services, and improving developer efficiency to quickly achieve high-volume deployments.
Meta’s latest MTIA gets a significant performance boost with GEN-O-GEN, which increases GEMM TOPs by 3.5x to 177 TFLOPS at BF16, hardware-based tensor quantization for accuracy comparable to FP32, and optimized support for PyTorch Eager Mode, enabling sub-1μs job launch times and sub-0.5μs job replacement. Additionally, TBE optimization improves download and prefetch times of embedding indices, achieving 2-3x faster runtimes compared to the previous generation.
The MTIA chip, built on TSMC’s 5nm process, operates at 1.35GHz with a gate count of 2.35 billion and offers 354 TOPS (INT8) and 177 TOPS (FP16) GEMM performance, utilizing 128GB of LPDDR5 memory with a bandwidth of 204.8GB/s, all within a TDP of 90 watts.
The Processing Elements are built on RISC-V cores, with both scalar and vector extensions, and Meta’s accelerator module includes dual CPUs. At Hot Chips 2024, ServeTheHome noted a memory expansion tied to the PCIe switch and the CPUs. When asked if this was CXL, Meta rather coyly said, “it is an option to add memory to the chassis, but it is not currently deployed.”