- At Open Compute Mission Summit (OCP) 2024, we’re sharing particulars about our next-generation community material for our AI coaching clusters.
- We’ve expanded our community {hardware} portfolio and are contributing two new disaggregated community materials and a brand new NIC to OCP.
- We sit up for continued collaboration with OCP to open designs for racks, servers, storage packing containers, and motherboards to profit corporations of all sizes throughout the business.
At Meta, we imagine that open {hardware} drives innovation. In at this time’s world, the place an increasing number of information heart infrastructure is being dedicated to supporting new and rising AI applied sciences, open {hardware} takes on an vital position in helping with disaggregation. By breaking down conventional information heart applied sciences into their core elements we will construct new programs which can be extra versatile, scalable, and environment friendly.
Since serving to discovered OCP in 2011, we’ve shared our information heart and part designs, and open-sourced our community orchestration software program to spark new concepts each in our personal information facilities and throughout the business. These concepts have made Meta’s information facilities among the most sustainable and efficient in the world. Now, by OCP, we’re bringing new open superior community applied sciences to our information facilities, and the broader business, for superior AI purposes.
We’re asserting two new milestones for our information facilities: Our next-generation community material for AI, and a brand new portfolio of community {hardware} that we’ve developed in shut partnership with a number of distributors.
DSF: Scheduled material that’s disaggregated and open
Community efficiency and availability play an vital position in extracting the most effective efficiency out of our AI training clusters. It’s for that cause that we’ve continued to push for disaggregation within the backend community materials for our AI clusters. Over the previous 12 months we’ve got developed a Disaggregated Scheduled Material (DSF) for our next-generation AI clusters to assist us develop open, vendor-agnostic programs with interchangeable constructing blocks from distributors throughout the business. DSF-based materials enable us to construct massive, non-blocking materials to assist high-bandwidth AI clusters.
DSF extends our disaggregating community programs to our VoQ-based switched programs which can be powered by the open OCP-SAI commonplace and FBOSS, Meta’s personal community working system for controlling community switches. VoQ-based site visitors scheduling ensures proactive congestion avoidance within the material slightly than reactive congestion signaling and response.
The DSF material helps an open and commonplace Ethernet-based RoCE interface to endpoints and accelerators throughout a number of xPUs and NICs, together with Meta’s MTIA in addition to from a number of distributors.
DSF platforms for next-generation AI materials
Arista 7700R4 sequence
The DSF platforms, Arista 7700R4 sequence, include devoted leaf and backbone programs which can be mixed to create a big, distributed swap. As a distributed system, DSF is designed to assist excessive scale AI clusters.
7700R4C-38PE: DSF Leaf Change
- DSF Distributed Leaf Change (Broadcom Jericho3-AI primarily based)
- 18 x 800GE (36 x 400GE) OSFP800 host ports
- 20 x 800Gbps (40 x 400Gbps) material ports
- 14.4Tbps of wirespeed efficiency with 16GB of buffers
7720R4-128PE: DSF Backbone Change
- DSF Distributed Backbone Change (Broadcom Ramon3 primarily based)
- Accelerated compute optimized pipeline
- 128 x 800Gbps (256 x 400Gbps) material ports
- 102.4Tbps of wirespeed efficiency
51T switches for next-generation 400G/800G materials
Meta will deploy two next-generation 400G material switches, the Minipack3 (the most recent model of Minipack, Meta’s personal material community swap) and the Cisco 8501, each of that are additionally backward suitable with earlier 200G and 400G switches and can assist upgrades to 400G and 800G.
The Minipack3 makes use of Broadcom’s newest Tomahawk5 ASIC whereas the Cisco 8501 relies on Cisco’s Silicon One G200 ASIC. These high-performance switches transmit as much as 51.2 Tbps with 64x OSFP ports, and the design is optimized with out the necessity of retimers to realize most energy effectivity. Additionally they have considerably decreased energy per bit in contrast with predecessor fashions.
Meta will run each the Minipack3 and Cisco 8501 on FBOSS.
Optics: 2x400G FR4 optics for 400G/800G optical interconnection
Meta’s information heart materials have advanced from 200 Gbps/400 Gbps to 400 Gbps/800 Gbps and we’ve already deployed 2x400G optics in our information facilities.
Evolving FBOSS and SAI for DSF
We proceed to embrace OCP-SAI to onboard the brand new community materials, swap {hardware} platforms, and optical transceivers to FBOSS. We have now collaborated with distributors, and the OCP group, to evolve SAI. It now helps new options and ideas like DSF and different enhanced routing schemes.
Builders and engineers from everywhere in the world can work with this open {hardware} and contribute their very own software program that they, in flip, can use themselves and share with the broader business.
FBNIC: A multi-host foundational NIC designed by Meta
We’re persevering with to design extra ASICs, together with the ASIC for FBNIC. FBNIC is a real multi-host foundational NIC and accommodates the primary of our Meta-designed community ASICs for our server fleet and MTIA options. It could possibly assist as much as 4 hosts with full datapath isolation for every host.The FBNIC driver has been upstreamed (obtainable from v6.11 kernel). The NIC module was designed by Marvell and has been contributed to OCP.
FBNIC’s key options embrace:
- Community interfaces for as much as 4×100/4×50/4×25 GE with SerDes assist for as much as 56G PAM4 per lane.
- As much as 4 unbiased PCIe Gen5 slices
- HW offloads together with LSO, Checksum
- Line fee timestamping (for every host all the best way from PHY) for PTP
- Header-Information cut up to help Zero-Copy
- Compliant with OCP NIC 3.0, model 1.2.0, design specification
The longer term is open
Advancing AI means constructing information heart infrastructure that goes past scale. It additionally has to permit for flexibility and carry out effectively and sustainably. At Meta, we envision a way forward for AI {hardware} programs that aren’t solely scalable, but in addition open and collaborative.
We encourage anybody who desires to assist advance the way forward for networking {hardware} for AI to interact with OCP and Meta to assist share the way forward for AI infrastructure.