Nvidia Developer 09月03日
NVIDIA光互连技术助力AI数据中心革新
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

随着AI需求的激增,NVIDIA推出了革命性的光互连解决方案,包括Spectrum-X以太网和Quantum InfiniBand平台。这些技术通过将光学元件集成到交换机芯片封装中(CPO),显著提高了数据中心的性能、能效和可靠性。与传统数据中心不同,AI工厂需要更高的带宽和更低的延迟,这促使了从CPU中心化到GPU驱动的转变,并需要光学网络。NVIDIA的CPO技术通过简化信号路径,减少了电力损耗和潜在故障点,实现了3.5倍的能效提升和10倍的可靠性增强,并加速了AI工厂的部署。这些创新为Agentic AI的未来奠定了基础。

💡 **AI工厂的网络需求升级**:传统数据中心依赖CPU和铜缆连接,适用于低网络需求的任务。而AI工厂则需要由成千上万的GPU组成,对数据中心内的最大带宽和最低延迟有极高要求,这使得光学网络成为必需品,导致了功耗和光学元件数量的显著增加。

⚡ **CPO技术优化能效与可靠性**:传统交换机通过可插拔光收发器传输信号,存在多重电接口,导致高达22dB的电损耗和更高的功耗(约30W/接口)。而NVIDIA的共封装光学(CPO)技术将电光转换集成在交换机封装内,将电损耗降至约4dB,功耗降至9W,显著提升了信号完整性、可靠性和能源效率。

🚀 **NVIDIA Quantum-X与Spectrum-X Photonics的创新**:Quantum-X InfiniBand Photonics提供115 Tb/s的交换容量和14.4 TFLOPS的内置计算能力,采用液冷技术。Spectrum-X Photonics则专为大规模以太网AI工厂设计,提供高达409.6 Tb/s的带宽,其CPO设计带来了3.5倍的能效提升和10倍的可靠性增强,同时缩短了部署时间。

📈 **CPO是AI数据中心发展的必然趋势**:通过消除可插拔光收发器并直接将光学器件集成到交换机ASIC封装中,CPO技术实现了每端口功耗的大幅降低,同时提高了网络密度。更少的离散有源组件和易于发生故障的光收发器的移除,显著提高了正常运行时间和运营可靠性,加速了AI工厂的部署和扩展。

As artificial intelligence redefines the computing landscape, the network has become the critical backbone shaping the data center of the future. Large language model training performance is determined not only by compute resources but by the agility, capacity, and intelligence of the underlying network. The industry is witnessing the evolution from traditional, CPU-centric infrastructures toward tightly-coupled, GPU-driven, network-defined AI factories. 

NVIDIA built a comprehensive suite of networking solutions to handle the quick-burst, high-bandwidth, and low-latency demands of modern AI training and inferencing at scale. This includes Spectrum-X Ethernet, NVIDIA Quantum InfiniBand, and BlueField platforms. By orchestrating compute and communication together, the NVIDIA networking portfolio lays the foundation for scalable, efficient, and resilient AI data centers, where the network is the central nervous system empowering the future of AI innovation.

In this blog, we’ll explore how NVIDIA networking innovations have enabled co-packaged optics to deliver massive power efficiency and resiliency improvements for large-scale AI factories. 

How does AI factory infrastructure compare to traditional enterprise data centers?

In traditional enterprise data centers, Tier 1 switches are integrated within each server’s rack, allowing direct copper connections to servers and minimizing both power and component complexity. This architecture sufficed for CPU-centric workloads with modest networking demands. 

In contrast, modern AI factories pioneered by NVIDIA feature ultra-dense compute racks and thousands of GPUs that are architected to work together on a single job. These require max bandwidth and minimum latency across the entire data center, which lead to new topologies where the Tier 1 switch is relocated to the end of the row. This configuration dramatically increases the distance between servers and switches, making optical networking essential. As a result, power consumption and the number of optical components rise significantly, with optics now required for both NIC-to-switch and switch-to-switch connections. 

This evolution, illustrated in Figure 1 below, reflects the substantial shift in topology and technology needed to meet the high-bandwidth, low-latency requirements of large-scale AI workloads. It fundamentally reshapes the physical and energy profile of the data center.

Figure 1. Scale-out and AI density depend on optical connectivity.

How do you optimize network reliability and power for AI factories?

Traditional network switches that utilize pluggable transceivers rely on multiple electrical interfaces. In these architectures, the data signal must traverse long electrical paths from the switch ASIC to the PCB, connectors and finally into the external transceiver before being converted to an optical signal. This segmented journey incurs substantial electrical loss, up to 22 dB for 200 gigabit-per-second channels, as illustrated in Figure 2 below. This amplifies the need for complex digital signal processing and multiple active components.

The result is a higher power draw (often 30W per interface), increased heat output, and a proliferation of potential failure points. The abundance of discrete modules and connections not only drives up system power and component count but directly undermines link reliability, creating ongoing operational challenges as AI deployments scale. Typical power consumption of components is shown below in Figure 3.

Figure 3. 3.5x power-saving with Spectrum-X Photonics

In contrast, switches with co-packaged optics (CPO) integrate the electro-optical conversion directly onto the switch package. Fiber connects directly with the optical engine that sits beside the ASIC, reducing electrical loss to only ~4 dB and slashing power use to as low as 9W. By streamlining the signal path and eliminating unnecessary interfaces, this design dramatically improves signal integrity, reliability, and energy efficiency. This is precisely what’s required for high-density, high-performance AI factories.

What do co-packaged optics bring to AI factories?

NVIDIA has designed CPO-based systems to meet unprecedented AI factory demands. By integrating optical engines directly onto the switch ASIC, the new NVIDIA Quantum-X Photonics and Spectrum-X Photonics (shown in Figure 4 below) will replace legacy pluggable transceivers. The new offerings streamline the signal path for enhanced performance, efficiency, and reliability. These innovations not only set new records in bandwidth and port density but also fundamentally alter the economics and physical design of AI data centers.

Figure 4: NVIDIA Photonics Switch ASICs with integrated co-packaged silicon photonics engines

How Quantum-X Photonics marks the next generation of InfiniBand networking

With the introduction of NVIDIA Quantum-X InfiniBand Photonics, NVIDIA propels InfiniBand switch technology to new heights. This platform features:

    115 Tb/s of switching capacity, supporting 144 ports at 800 Gb/s each14.4 teraflops of in-network computing with the fourth generation of NVIDIA Scalable Hierarchical Aggregation Reduction Protocol (SHARP) technologyLiquid cooling for superior thermal managementDedicated InfiniBand management ports for robust in-band control and monitoring

NVIDIA Quantum-X leverages integrated silicon photonics to achieve unmatched bandwidth, ultra-low latency, and operational resilience. The co-packaged optical design reduces power consumption, improves reliability, enables rapid deployment, and supports the massive interconnect requirements of agentic AI workloads. 

How Spectrum-X Photonics enables massive scale Ethernet AI factories

Expanding the CPO revolution into Ethernet, NVIDIA Spectrum-X Photonics switches are specifically designed for generative AI and large-scale LLM training and inference tasks. The new Spectrum-X Photonics offerings include two liquid-cooled chasses based on the Spectrum-6 ASIC:

    Spectrum SN6810: 102.4 Tb/s bandwidth with 128 ports at 800 Gb/sSpectrum SN6800: 409.6 Tb/s bandwidth with a remarkable 512 ports at 800 Gb/s

Both platforms are powered by NVIDIA silicon photonics, drastically reducing the number of discrete components and electrical interfaces. The result is a 3.5x leap in power efficiency compared to previous architectures, and a 10x improvement in resiliency by reducing the number of overall optical components that may fail. Technicians benefit from improved serviceability, while AI operators see 1.3x faster time-to-turn-on and enhanced time-to-first-token.

NVIDIA’s co-packaged optics are enabled by a robust ecosystem of partners. This cross-industry collaboration ensures not only technical performance but also manufacturing scalability and reliability needed for large-scale global AI infrastructure deployments.

How CPO delivers performance, power, and reliability breakthroughs

The advantages of co-packaged optics are clear:

    3.5x power efficiency: By eliminating pluggable transceivers and integrating optics directly into the switch ASIC package, the power required per port drops dramatically, even as network density soars.10x higher resiliency: Fewer discrete active components and the removal of failure-prone transceivers boost uptime and operational reliability.1.3x faster time-to-operation: Streamlined assembly and maintenance translate to accelerated deployment and rapid scaling of AI factories.

The switch systems achieve industry-leading bandwidth (up to 409.6 Tb/s and 512 ports at 800 Gb/s), all supported by efficient liquid cooling to handle dense, high-wattage environments. Figure 5 (below) shows NVIDIA Quantum-X Photonics Q3450, and two variants of Spectrum-X Photonics—single-ASIC SN6810 and quad-ASIC SN6800 with integrated fiber shuffle.

Together, these products underpin a transformation in network architecture, meeting the insatiable bandwidth and ultra-low latency requirements posed by AI workloads. The combination of cutting-edge optical components and robust system-integration partners creates a fabric optimized for present and future scaling needs. As hyperscale data centers demand ever-faster deployment and bulletproof reliability, CPO moves from innovation to necessity. 

Figure 5. NVIDIA Quantum-X and Spectrum-X Photonics switch systems

How this ushers in the next era of agentic AI

NVIDIA Quantum-X and Spectrum-X Photonics switches signal a shift to networks purpose-built for the relentless demands of AI at scale. By eliminating bottlenecks of traditional electrical and pluggable architectures, these co-packaged optics systems deliver the performance, power efficiency, and reliability required by modern AI factories. With commercial availability for NVIDIA Quantum-X InfiniBand switches set for early 2026 and Spectrum-X Ethernet switches in the second half of 2026, NVIDIA is setting the standard for optimized networking in the age of agentic AI.

Stay tuned for the second part of this blog, where we take a look under the hood of these groundbreaking platforms. We’ll dive into the architecture and operation of the silicon photonics engines powering NVIDIA Quantum-X Photonics and Spectrum-X Photonics, shedding light on the core innovations and engineering breakthroughs that make next-generation optical connectivity possible. From advances in on-chip integration to novel modulation schemes, the next installment will unravel the technologies that set these photonics engines apart in the world of AI networking.

To learn more about NVIDIA Photonics, visit this page.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

NVIDIA AI数据中心 光互连 CPO Spectrum-X Quantum InfiniBand NVIDIA Photonics AI networking data center evolution high-performance computing
相关文章