At Supercomputing 2024 (SC24), Enfabrica Corporation unveiled a milestone in AI data center networking: the Accelerated Compute Fabric (ACF) SuperNIC chip. This 3.2 Terabit-per-second (Tbps) Network Interface Card (NIC) SoC redefines large-scale AI and machine learning (ML) operations by enabling massive scalability, supporting clusters of over 500,000 GPUs. Enfabrica also raised $115 million in funding and is expected to release its (ACF) SuperNIC chip in Q1 2025.
As AI models grow increasingly large and sophisticated, data centers face mounting pressures to connect large numbers of specialized processing units, such as GPUs. These GPUs are crucial for high-speed computation in training and inference but are often left idle due to inefficient data movement across existing network architectures. The challenge lies in effectively interconnecting thousands of GPUs to ensure optimal data transfer without bottlenecks or performance degradation.
Traditional networking approaches can link approximately 100,000 AI computing chips in a data center before inefficiencies and slowdowns become significant. According to Enfabrica’s CEO, Rochan Sankar, the company’s new technology supports up to 500,000 chips in a single AI/ML system, enabling larger and more reliable AI model computations. By overcoming the constraints of conventional NIC designs, Enfabrica’s ACF SuperNIC maximizes GPU utilization and minimizes downtime.
The ACF SuperNIC boasts several industry-first features tailored to modern AI data center needs:
Traditional systems often require one-to-one connections between GPUs and various components, such as PCIe switches and RDMA NICs. However, as the number of GPUs in a system increases, the risk of links to switches failures grows, with potential disruptions occurring as often as every 23 minutes in setups with over 100,000 GPUs, according to Shankar.
The ACF SuperNIC addresses this issue by enabling multiple connections from GPUs to switches. This redundancy minimizes the impact of individual component failures, boosting system uptime and reliability.
The SuperNIC also introduces the Collective Memory Zoning feature, which supports zero-copy data transfers and optimizes host memory management. By reducing latency and enhancing memory efficiency, this technology maximizes the floating-point operations per second (FLOPs) utilization of GPU server fleets.
The ACF SuperNIC’s design is not only about scale but also about operational efficiency. It provides a software stack that integrates with standard communication, existing interfaces, and RDMA networking operations. This compatibility ensures efficient deployment across diverse AI compute environments composed of GPUs and accelerators (AI chips) from different vendors. Data center operators benefit from streamlined networking infrastructure, reducing complexity and enhancing the flexibility of their AI data centers.
Enfabrica’s ACF SuperNIC will be available in limited quantities in Q1 2025, with both the chips and pilot systems now open for orders through Enfabrica and selected partners. As AI models demand higher performance and larger scales, Enfabrica’s innovative approach could play a pivotal role in shaping the next generation of AI data centers designed to support Frontier AI models.