Nvidia Developer 09月03日
AI数据北向网络优化策略
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

在AI基础设施中,数据是计算引擎的燃料。随着智能代理AI系统的演进,企业面临快速、智能、可靠地移动大量数据的挑战。北向网络性能直接影响AI系统的响应能力。NVIDIA企业参考架构(Enterprise RAs)指导组织有效部署使用北向网络的AI工厂。NVIDIA Spectrum-X Ethernet在加速北向数据流方面发挥着关键作用,尤其适用于使用NVIDIA BlueField-3 DPUs的数据密集型AI用例。传统以太网存储网络往往因规模、数据流和敏感性不足而引入延迟和拥塞,影响性能。AI模型在训练过程中每次检查点都需要将大量数据移动到持久存储。推理工作负载同样依赖北向效率,例如从检索增强生成(RAG)向量数据库检索嵌入或从工具或数据库检索外部上下文。随着企业从静态单次推理转向动态多轮多代理推理,北向网络需求放大。NVIDIA Spectrum-X Ethernet使网络成为专为现代AI工作负载性能需求设计的无损AI数据存储和移动织物。收敛式网络简化了企业级AI工作负载的基础设施,将东-西流量和北-南流量整合到统一交换机织物中,减少复杂性并确保跨训练、推理和检索工作负载的一致高性能。Spectrum-X Ethernet通过自适应路由和遥测防止拥塞,增加吞吐量并减少延迟。虚拟路由和转发(VRF)服务分离和QoS流量优先级排序等功能进一步确保性能。NVIDIA超级网卡处理GPU到GPU的东-西流量,而BlueField-3 DPUs处理北-南流量,如存储管理、遥测和网络安全,释放CPU资源。这种双重方法使企业能够优化AI基础设施所有层的性能。以NVIDIA RAG 2.0蓝图为例,它通过集成外部知识扩展大型语言模型(LLMs)的能力,使用向量数据库提供更准确和上下文相关的响应。用户查询通过入口网关进入AI工厂,经叶交换机路由到GPU服务器,BlueField-3 DPU处理数据包解析和网络栈卸载。外部上下文检索通过叶-骨干Spectrum-X Ethernet网络,使用RoCE协议访问NVMe存储系统。返回数据同样通过收敛式网络,DPU处理数据包重排序。LLM推理完成后,最终响应通过VRF隔离的网络返回用户。高效的北向网络防止瓶颈,维持系统流畅响应,解锁更快决策和改善用户体验。

📚 AI模型训练时,每次检查点都需要将大量数据(可达数TB)移动到持久存储,这依赖于高效的北向网络,以避免训练中断和数据丢失。

🌐 推理工作负载同样依赖北向网络效率,例如从向量数据库检索嵌入或从工具/数据库获取外部上下文,这对低延迟网络连接至关重要。

🚀 NVIDIA Spectrum-X Ethernet在加速北向数据流方面发挥关键作用,尤其适用于使用NVIDIA BlueField-3 DPUs的数据密集型AI用例,它通过自适应路由和遥测防止拥塞,增加吞吐量并减少延迟。

🔗 NVIDIA超级网卡(SuperNICs)处理GPU到GPU的东-西流量,而BlueField-3 DPUs处理北-南流量(如存储管理、遥测和网络安全),这种双重方法使企业能够优化AI基础设施所有层的性能。

🧠 以NVIDIA RAG 2.0蓝图为例,它通过集成外部知识扩展大型语言模型(LLMs)的能力,使用向量数据库提供更准确和上下文相关的响应,这依赖于快速北向数据流来检索外部知识。

In AI infrastructure, data fuels the compute engine. With evolving agentic AI systems, where multiple models and services interact, fetch external context, and make decisions in real time, enterprises face the growing challenge of moving massive amounts of data quickly, intelligently, and reliably. Whether it is loading a model from persistent storage, retrieving knowledge to support a query, or orchestrating agentic tool use, data movement is central to AI performance.

GPU-to-GPU (east-west) communication has long been a focus of optimization. However, equally critical are the north-south networks—handling model loading, storage I/O, and inference queries—where performance bottlenecks can directly impact the responsiveness of AI systems.

NVIDIA Enterprise Reference Architectures (Enterprise RAs) guide organizations on how to effectively deploy AI factories that use north-south networks. They are design recipes that help organizations build scalable, secure, and high-performing AI factories. Providing clear, validated pathways for deploying complex AI infrastructure, Enterprise RA’s distill NVIDIA’s extensive experience into actionable recommendations, from server and network configurations to software stacks and operational best practices. 

Among the many components of Enterprise RAs, NVIDIA Spectrum-X Ethernet deserves particular attention for its role in accelerating north-south data flows, especially for data-intensive AI use cases with NVIDIA BlueField-3 DPUs (data processing units). 

Legacy Ethernet storage networks, not built for the scale, data flows, and sensitivity of accelerated AI and HPC workloads, often introduce latency and congestion that degrade performance. Every time an AI model checkpoints its progress mid-training, it moves massive amounts of data across north-south pathways to persistent storage (learn how the NVIDIA-Certified Storage Program complements the Enterprise RA program). These checkpoint files, which can span several terabytes for today’s billion-parameter models, ensure that progress isn’t lost when systems go down. 

Inference workloads rely just as heavily on north-south efficiency. When an AI agent retrieves data, whether it’s embeddings from a retrieval-augmented generation (RAG) vector database or external context from a tool or database for a customer query, it depends on fast, latency-sensitive north-south connectivity. As enterprises shift from static one-shot inference to dynamic, multi-turn, multi-agent inference, this amplifies north-south networking demands by another order of magnitude. This happens as agents ingest, process, and update data by continuously interacting with users, external sources, and cloud services.

By using NVIDIA Spectrum-X Ethernet for accelerated data movement in Enterprise RAs, these networks become lossless AI data storage and movement fabrics—purpose-built for the performance demands of modern AI workloads. This enterprise-ready architecture enables the creation of AI factories optimized for predictable, high-throughput, low-latency data access, unlocking the full potential of modern AI workflows.

Converged networking: a simplified foundation for Enterprise AI workloads

Enterprise AI factories are often built to address a defined set of use cases, with networks typically starting in the range of 4 to 16 server nodes. In this scenario, a converged design that consolidates east-west traffic, such as compute, and north-south traffic, such as storage and external services, into a unified switch fabric helps streamline operations. This design reduces complexity by minimizing cabling and hardware sprawl yet ensures consistent high-throughput performance across training, inference, and retrieval workloads. But a converged east-west/north-south network requires networking that can deliver sufficient bandwidth and quality of service (QoS) to support both types of traffic. 

Spectrum-X Ethernet, which sits at the heart of Enterprise RAs, plays a key role. While originally optimized for east-west GPU-to-GPU and node-to-node communication, it delivers bandwidth and performance benefits to north-south networks and the storage data path by using adaptive routing and telemetry to prevent congestion, increase throughput, and reduce latency during AI runtime and retrieval-heavy workloads. 

Equally important are Spectrum-X Ethernet capabilities like virtual routing and forwarding (VRF) service separation and quality of service (QoS) traffic prioritization. VRFs logically segment east-west communication from north-south traffic, such as user ingress or storage access, without requiring physical network segmentation. QoS appends tags to the Ethernet frame or IP packet headers to ensure specific traffic is prioritized depending on the use case (e.g., storage traffic over HTTPS user traffic). These mechanisms are further reinforced by advanced features such as noise isolation, which ensure consistent performance when multiple AI agents or workloads are running concurrently on shared infrastructure.

It’s important to note that while convergence is well-suited for enterprise-scale AI factories, it isn’t a one-size-fits-all approach. In large-scale, multi-tenant environments, such as those operated by NVIDIA Cloud Partners (NCPs), a disaggregated network model with physically connected networks may be preferred to ensure the highest effective bandwidth and enable stricter isolation between tenants and traffic types. 

Converged networking is a deliberate design choice that aligns with the enterprise-scale use case, performance, and manageability needs of dedicated AI infrastructure. Enterprise RAs break down the complex task of determining the optimal network architecture for a specific use case by offering a range of instructions for small foundation clusters to larger deployments that scale to 1k GPUs.

Understanding the role of NVIDIA Ethernet SuperNICs and BlueField-3 DPUs

To understand how networking is orchestrated in an AI factory, it’s helpful to distinguish between the roles of NVIDIA Ethernet SuperNICs and DPUs. NVIDIA SuperNICs are built specifically to handle the east-west traffic that dominates GPU-to-GPU communication. Designed for hyperscale AI environments, they deliver up to 800 Gb/s of bandwidth per GPU, ensuring ultra-fast data connectivity during distributed training and inference. 

Complementing this, BlueField-3 DPUs take charge of north-south traffic. BlueField-3 offloads, accelerates, and separates tasks such as storage management, telemetry, and network security from the host CPU, freeing up valuable compute resources for core AI processing. In effect, it acts as a specialized cloud infrastructure processor that ensures data moves efficiently between the AI factory and its external ecosystem, including networked storage. 

Together, SuperNICs and BlueField-3 DPUs form a powerful symphony of AI networking. SuperNICs fuel and route the internal computations of the AI factory, while BlueField-3 DPUs ensure that external data feeds arrive smoothly and at scale. This dual approach enables enterprises to optimize performance across all layers of their AI infrastructure.

The enterprise impact: vector databases and real-time retrieval

A relatable example of north-south networking is found in the growing adoption of agentic AI and RAG systems. Architectures, such as the NVIDIA RAG 2.0 Blueprint, extend the capabilities of large language models (LLMs) by integrating external knowledge such as documents, images, logs, and videos. The RAG Blueprint uses NVIDIA NeMo Retriever and NVIDIA NIM microservices to embed, index, and retrieve this content using vector databases, providing more accurate and contextually relevant responses. 

When a user submits a query, the LLM creates a vector embedding, which is used to rapidly query a vector database such as Milvus, sitting in external storage, for the most relevant embedded context. This interaction hinges on fast (low-latency) north-south data flow. The sooner the system retrieves and integrates this external knowledge, the faster and more precise its response. A converged Spectrum-X Ethernet network optimizes this data path, ensuring minimal latency and maximum throughput as models fetch embeddings in real time. 

Figure 1. Step-by-step flow of a RAG-enhanced LLM user query through the NVIDIA Spectrum-X Ethernet networking platform

Let’s examine the north-to-south user-compute-storage flow:

    User query ingress (user to Internet to leaf): A user prompt or task flows into the AI factory through an ingress gateway, hits the leaf switch, and descends into the cluster. Enterprise RAs streamline this path with Spectrum-X Ethernet, reducing the time to first token (TTFT) for applications relying on external data and avoiding manual network configuration tuning.
    Request routed to GPU Server (leaf to GPU via DPU): The request is directed to a GPU node by the leaf switch, where a BlueField-3 DPU handles packet parsing, offloads the networking stack, and routes the query to the correct inference engine (e.g., NVIDIA NIM). The request flows across the leaf-spine Spectrum-X Ethernet network switch using adaptive routing to avoid congestion. Spectrum-X Ethernet uses the real-time state of the switch or queue occupancy to dynamically keep traffic flowing efficiently, similar to how a map app reroutes you around traffic jams.
    External context fetch (server to leaf to spine to leaf to storage): For context queries (e.g., vector databases), the request flows through the leaf-spine fabric via RoCE (RDMA over Converged Ethernet) to an NVMe-based storage system. Spectrum-X Ethernet features seamless interoperability and optimized performance for AI workloads accessing data on partner platforms such as DDN, VAST Data, and WEKA, delivering up to 1.6x faster storage performance.
    Data returned to GPU (storage to leaf to spine to leaf to server): The relevant vectors and embedded content are returned over the same converged fabric via RoCE. Spectrum-X Ethernet enables this path to be congestion-aware, with the DPU handling packet reordering to keep the GPU fed efficiently. Here, QoS markings can ensure that latency-sensitive storage data is prioritized, especially when many AI agents are querying multiple tools over north-south traffic.
    LLM inference and final response (GPU to leaf to user): With both the original prompt and relevant external context in memory, the GPU completes inference. The final response is routed upwards and exits the infrastructure back to the user application. VRF-based network isolation ensures storage, inference, and user traffic stay logically independent, ensuring stable performance at scale.

In environments where several AI agents operate concurrently—collaborating to solve complex tasks or serving multiple user queries—efficient north-south networking prevents bottlenecks and maintains a fluid, responsive system. By streamlining these retrieval processes, enterprises unlock faster decision-making and improved user experiences. Whether in customer support chatbots, financial advisory tools, or internal knowledge management platforms, agentic AI and RAG architectures powered by efficient north-south networks deliver tangible business value.

In conclusion, AI workloads are no longer confined to massive training clusters tucked away in isolated environments. Increasingly, they are embedded into the fabric of everyday enterprise operations, requiring seamless interaction with data lakes, external services, and user-facing applications. In this new paradigm, north-south networks are making a comeback as the heroes of AI factories. With the combined strengths of NVIDIA Spectrum-X Ethernet, NVIDIA BlueField, and thoughtful NVIDIA Enterprise RA-based designs, organizations can ensure their AI factories are resilient, performant, and ready to scale as AI workloads evolve.

For additional information about solutions based on NVIDIA Enterprise RAs, please consult your NVIDIA-Certified partner for tailored deployment strategies.

Learn more:

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI网络 北向网络 NVIDIA Spectrum-X Ethernet BlueField-3 DPU 企业级AI 向量数据库 RAG架构 AI基础设施
相关文章