NVIDIA Blog 10月13日 23:48
NVIDIA展示未来兆瓦级AI工厂新架构
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

NVIDIA在OCP全球峰会上预告了其兆瓦级AI工厂的未来发展蓝图。重点包括即将发布的NVIDIA Vera Rubin NVL144 MGX-generation开放架构机架服务器,以及支持576颗Rubin Ultra GPU的NVIDIA Kyber连接技术。多家行业伙伴正积极准备支持这一新一代800伏特直流(VDC)数据中心基础设施,该基础设施旨在提升可扩展性、能源效率和性能。Foxconn、CoreWeave、Lambda等公司已开始设计和建设800VDC数据中心,Vertiv推出了相应的参考架构,HPE也宣布了产品支持。800VDC的优势与电动汽车和太阳能行业已广泛采用的技术相似。Vera Rubin NVL144采用模块化、全液冷设计,简化了组装和维护,并通过开放标准促进生态系统的快速扩展。

💡 NVIDIA正在推动下一代AI工厂的发展,重点是兆瓦级(Gigawatt-scale)数据中心基础设施。通过推出NVIDIA Vera Rubin NVL144 MGX-generation开放架构机架服务器和NVIDIA Kyber技术,旨在支持大规模AI推理需求。该新架构采用模块化、全液冷设计,简化了组装和维护,并计划作为开放标准贡献给OCP联盟,以促进生态系统的快速扩展和标准化。

⚡️ 新一代数据中心将转向800伏特直流(VDC)供电,以应对AI工作负载带来的高能耗挑战。相比传统的415或480伏交流三相系统,800VDC在数据中心的能源效率、可扩展性、材料使用和性能方面均有显著提升。包括Foxconn、CoreWeave、Lambda、Oracle Cloud Infrastructure和Together AI在内的众多行业先锋已开始采用或设计800VDC数据中心,Vertiv也推出了相应的参考架构。

🚀 NVIDIA KyberRack服务器架构是未来AI工厂的关键组成部分,它能够支持高达576颗NVIDIA Rubin Ultra GPU的高密度平台,并优化了800VDC供电和液冷设计。通过垂直堆叠计算刀片,Kyber显著提高了机箱内的GPU密度,并集成了NVLink交换刀片,实现无缝的网络扩展。这种设计不仅提升了性能,还通过减少铜的使用量,为客户节省了大量成本。

🤝 NVIDIA正在构建一个开放的生态系统来支持AI工厂的扩展。超过20家合作伙伴,包括半导体供应商、电源系统组件提供商和数据中心电源系统提供商,正在共同开发和交付符合开放标准的机架服务器。此外,NVIDIA NVLink Fusion生态系统也在扩展,Intel和Samsung Foundry的加入将进一步加速定制化AI芯片的集成和上市时间,为AI工厂的快速部署提供支持。

At the OCP Global Summit, NVIDIA is offering a glimpse into the future of gigawatt AI factories.

NVIDIA will unveil specs of the NVIDIA Vera Rubin NVL144 MGX-generation open architecture rack servers, which more than 50 MGX partners are gearing up for along with ecosystem support for NVIDIA Kyber, which connects 576 Rubin Ultra GPUs, built to support increasing inference demands.

Some 20-plus industry partners are showcasing new silicon, components, power systems and support for the next-generation, 800-volt direct current (VDC) data centers of the gigawatt era that will support the NVIDIA Kyber rack architecture.

Foxconn provided details on its 40-megawatt Taiwan data center, Kaohsiung-1, being built for 800 VDC. CoreWeave, Lambda, Nebius, Oracle Cloud Infrastructure and Together AI are among other industry pioneers designing for 800-volt data centers. In addition, Vertiv unveiled its space-, cost- and energy-efficient 800 VDC MGX reference architecture, a complete power and cooling infrastructure architecture. HPE is announcing product support for NVIDIA Kyber as well as NVIDIA Spectrum-XGS Ethernet scale-across technology, part of the Spectrum-X Ethernet platform.

Moving to 800 VDC infrastructure from traditional 415 or 480 VAC three-phase systems offers increased scalability, improved energy efficiency, reduced materials usage and higher capacity for performance in data centers. The electric vehicle and solar industries have already adopted 800 VDC infrastructure for similar benefits.

The Open Compute Project, founded by Meta, is an industry consortium of hundreds of computing and networking providers and more focused on redesigning hardware technology to efficiently support the growing demands on compute infrastructure.

Vera Rubin NVL144: Designed to Scale for AI Factories

The Vera Rubin NVL144 MGX compute tray offers an energy-efficient, 100% liquid-cooled, modular design. Its central printed circuit board midplane replaces traditional cable-based connections for faster assembly and serviceability, with modular expansion bays for NVIDIA ConnectX-9 800GB/s networking and NVIDIA Rubin CPX for massive-context inference.

The NVIDIA Vera Rubin NVL144 offers a major leap in accelerated computing architecture and AI performance. It’s built for advanced reasoning engines and the demands of AI agents.

Its fundamental design lives in the MGX rack architecture and will be supported by 50+ MGX system and component partners. NVIDIA plans to contribute the upgraded rack as well as the compute tray innovations as an open standard for the OCP consortium.

Its standards for compute trays and racks enable partners to mix and match in modular fashion and scale faster with the architecture. The Vera Rubin NVL144 rack design features energy-efficient 45°C liquid cooling, a new liquid-cooled busbar for higher performance and 20x more energy storage to keep power steady.

The MGX upgrades to compute tray and rack architecture boost AI factory performance while simplifying assembly, enabling a rapid ramp-up to gigawatt-scale AI infrastructure.

NVIDIA is a leading contributor to OCP standards across multiple hardware generations, including key portions of the NVIDIA GB200 NVL72 system electro-mechanical design. The same MGX rack footprint supports GB300 NVL72 and will support Vera Rubin NVL144, Vera Rubin NVL144 CPX and Vera Rubin CPX for higher performance and fast deployments.

If You Build It, They Will Come: NVIDIA Kyber Rack Server Generation

The OCP ecosystem is also preparing for NVIDIA Kyber, featuring innovations in 800 VDC power delivery, liquid cooling and mechanical design.

These innovations will support the move to rack server generation NVIDIA Kyber — the successor to NVIDIA Oberon — which will house a high-density platform of 576 NVIDIA Rubin Ultra GPUs by 2027.

The most effective way to counter the challenges of high-power distribution is to increase the voltage. Transitioning from a traditional 415 or 480 VAC three-phase system to an 800 VDC architecture offers various benefits.

The transition afoot enables rack server partners to move from 54 VDC in-rack components to 800 VDC for better results. An ecosystem of direct current infrastructure providers, power system and cooling partners, and silicon makers — all aligned on open standards for the MGX rack server reference architecture — attended the event.

NVIDIA Kyber is engineered to boost rack GPU density, scale up network size and maximize performance for large-scale AI infrastructure. By rotating compute blades vertically, like books on a shelf, Kyber enables up to 18 compute blades per chassis, while purpose-built NVIDIA NVLink switch blades are integrated at the back via a cable-free midplane for seamless scale-up networking.

Over 150% more power is transmitted through the same copper with 800 VDC, enabling eliminating the need for 200-kg copper busbars to feed a single rack.

Kyber will become a foundational element of hyperscale AI data centers, enabling superior performance, efficiency and reliability for state-of-the-art generative AI workloads in the coming years. NVIDIA Kyber racks offer a way for customers to reduce the amount of copper they use by the tons, leading to millions of dollars in cost savings.

NVIDIA NVLink Fusion Ecosystem Expands

In addition to hardware, NVIDIA NVLink Fusion is gaining momentum, enabling companies to seamlessly integrate their semi-custom silicon into highly optimized and widely deployed data center architecture, reducing complexity and accelerating time to market.

Intel and Samsung Foundry are joining the NVLink Fusion ecosystem that includes custom silicon designers, CPU and IP partners, so that AI factories can scale up quickly to handle demanding workloads for model training and agentic AI inference.

It Takes an Open Ecosystem: Scaling the Next Generation of AI Factories

More than 20 NVIDIA partners are helping deliver rack servers with open standards, enabling the future gigawatt AI factories.

Learn more about NVIDIA and the Open Compute Project at the OCP Global Summit, taking place at the San Jose Convention Center from Oct. 13-16.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

NVIDIA AI工厂 OCP全球峰会 Gigawatt AI Factories 800VDC Vera Rubin NVL144 NVIDIA Kyber Open Compute Project 数据中心 液冷技术 GPU NVLink Fusion
相关文章