At the recent Hot Chips 2024 symposium in Stanford, California, Enfabrica introduced its Accelerated Compute Fabric SuperNIC (ACF-S) silicon and system solutions.
It is designed to scale AI networks to millions of GPUs, offering data center operators higher bandwidth, greater resiliency, lower latency, and enhanced programmatic control.
The presentation, titled “ACF-S: An 8 Terabit/sec SuperNIC for High-Performance Data Movement in AI and Accelerated Compute Networks,” was presented by Enfabrica’s Chief Development Officer and Co-Founder Shrijeet Mukherjee, along with Technical Engineer Thomas Norrie. They discussed the architecture, design, and technical features of their first-generation ACF SuperNIC silicon, codenamed “Millennium.”
Built differently
Reporting on the event, ServeTheHome noted that Enfabrica’s ACF-S aims to unify communications between scale-up (adding resources to one system) and scale-out (connecting multiple systems).
Although the network layout resembles traditional PCIe switch-based networks, it is not a PCIe switch. Instead, it uses a logically rail-switched 2-tier CLOS networking architecture that connects multiple CPUs, GPUs, and other components via ACF-S chips and GPU fabric switches. This architecture supports flexible, high-performance communications across different computing domains (such as IPC and RPC), enabling efficient processing of data-intensive tasks without the limitations of conventional PCIe switch designs.
The ACF-S “Millennium” chip is a key component designed to deliver resilient networking for GPUs with 3.2 Tbps of bandwidth per accelerator. It features a full router, multi-planar internal switch fabric, and user-programmable transport, supporting scalable infrastructure with up to 40,000 copy engines and data movers.
Enfabrica notes that the Millennium chip is built differently, incorporating higher on-chip I/O density, crossbar NICs, scalable memory translation, and shared flow buffer and packet processing, all of which improve performance and efficiency.
Enfabrica’s approach essentially focuses on maximizing computational efficiency by optimizing hardware and software integration, improving I/O and memory scalability, and implementing smart traffic management to enhance network performance and system resiliency. ServeTheHome summarizes: “It’s like taking a bunch of NICs and combining them, and PCIe switches, and combining them all into one. The other interesting use case is that you can add CXL memory to the ACF-S fabric and present pools of CXL memory without any hosts. This is super cool.”