Alibaba unveils the network and data center design it uses to train large language models

Alibaba has unveiled its data center design for the LLM program, which apparently consists of an Ethernet-based network with each host containing eight GPUs and nine NICs, each with two 200 Gb/sec ports.

The tech giant, which also offers one of the best large language models (LLM) trained on 110 billion parameters through its Qwen model, says this design has been used in production for eight months and aims to maximize the use of a GPU PCIe capabilities increase the network’s transmit/receive capacity.