Meta recently revealed details about the company’s AI training infrastructure, revealing that it currently relies on nearly 50,000 Nvidia H100 GPUs to train its open source Llama 3 LLM.
Like many major tech companies involved in AI, Meta is looking to reduce its dependence on Nvidia’s hardware and has taken another step in that direction.
Meta already has its own AI inference accelerator, Meta Training and Inference Accelerator (MTIA), which is tailor-made for the social media giant’s internal AI workloads, especially those that improve experiences with its various products. The company has now shared insights on its second generation MTIA, which significantly improves over its predecessor.
Software stack
This updated version of MTIA, which can handle inference but not training, doubles the compute and memory bandwidth of the previous solution, maintaining a tight fit with Meta’s workload. It is designed to efficiently serve ranking and recommendation models that provide suggestions to users. The new chip architecture aims to provide a balanced mix of computing power, memory bandwidth and memory capacity to meet the unique needs of these models. The architecture enhances SRAM capabilities, enabling high performance even at smaller batch sizes.
The latest Accelerator consists of an 8×8 grid of processing elements (PEs) that deliver 3.5 times higher compute performance and frugal compute performance that is said to be seven times better than MTIA v1. The progress comes from optimizations in the new architecture around the sparse computing pipeline and the way data is fed into the PEs. Key features include triple the size of local storage, double the on-chip SRAM and 3.5x bandwidth, and double the LPDDR5 capacity.
In addition to the hardware, Meta also focuses on co-designing the software stack with the silicon to synergize an optimal overall inference solution. The company says it has developed a rugged, rack-based system that can accommodate up to 72 accelerators, designed to clock the chip at 1.35 GHz and operate at 90 W.
Among other developments, Meta says it has also upgraded the structure between accelerators, significantly increasing bandwidth and system scalability. The Triton-MTIA, a backend compiler built to generate high-quality code for MTIA hardware, further optimizes the software stack.
The new MTIA won’t have a huge impact on Meta’s roadmap to a future less reliant on Nvidia’s GPUs, but it is another step in that direction.