Nvidia Developer 10月29日 03:27
NVIDIA推出Nemotron系列新模型,赋能智能代理AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

NVIDIA发布了Nemotron系列新模型,包括Nemotron Nano 3、Nemotron Nano 2 VL、Nemotron Parse 1.1以及Nemotron RAG和Llama 3.1 Nemotron Safety Guard。这些模型旨在帮助开发者构建更强大、更安全的智能代理AI系统,涵盖了推理、多模态理解、文档处理、检索增强生成(RAG)以及内容安全等关键领域。Nemotron Nano 3以其高效的MoE架构提升了推理能力,Nemotron Nano 2 VL则通过混合Mamba-Transformer架构增强了视觉语言理解能力,特别是在文档和视频处理方面。Nemotron Parse 1.1则专注于高效的文档结构化信息提取。此外,Nemotron RAG提供了企业级的数据隐私和安全连接能力,而Llama 3.1 Nemotron Safety Guard则通过多语言内容安全模型,确保AI的负责任使用。这些新工具和模型通过NVIDIA NeMo生态系统提供,支持开发者进行模型评估、优化和部署,加速AI应用的开发进程。

🚀 **Nemotron系列模型提升AI代理能力**:NVIDIA发布了Nemotron Nano 3、Nemotron Nano 2 VL、Nemotron Parse 1.1等一系列模型,旨在赋能开发者构建更智能、更高效的代理AI系统。Nemotron Nano 3通过MoE架构优化了推理能力,Nemotron Nano 2 VL则在多模态理解方面表现出色,能够处理文本、图像、表格和视频,尤其适用于文档智能和视频理解。Nemotron Parse 1.1则专注于从文档中提取结构化文本和表格,提升文档处理效率。

🔒 **加强AI安全与可靠性**:为了应对AI系统可能带来的风险,NVIDIA推出了Llama 3.1 Nemotron Safety Guard,这是一个多语言内容安全模型,能够检测和过滤23个类别的有害内容,并支持九种语言,确保AI应用的负责任使用。同时,Nemotron RAG模型套件为构建检索增强生成(RAG)管道提供了支持,确保数据隐私和安全的企业级连接,为构建可信赖的AI应用奠定基础。

🛠️ **NVIDIA NeMo生态系统加速开发与部署**:NVIDIA NeMo提供了评估、优化和部署AI模型的全面工具。NeMo Evaluator SDK支持可复现的基准测试,而NeMo Agent Toolkit及其Agent Optimizer功能则能自动调整模型参数,优化AI代理的性能。这些工具和模型,包括开放的数据集和详细的教程,为开发者从原型设计到大规模部署提供了端到端的支持,降低了开发门槛,加速了创新。

💡 **开放模型与高效推理**:NVIDIA强调了开放模型、高质量数据集、优化模型精度和计算效率以及高效推理的重要性。Nemotron系列模型,如Nemotron Nano 3的MoE架构和Nemotron Nano 2 VL的EVS方法,都旨在提高吞吐量、降低延迟和计算成本。这些模型已在Hugging Face等平台上提供,并支持多种推理引擎,便于开发者集成和应用。


Agentic AI
is an ecosystem where specialized language and vision models work together. They handle planning, reasoning, retrieval, and safety guardrailing.

Developers need specialized AI agents for domain-specific workflows, real-world deployment, and compliance. Building specialized AI requires four critical ingredients: open models that can be fine-tuned, robust datasets, recipes for optimum model accuracy and compute, and efficient inference for deploying them at scale.

At NVIDIA GTC DC, we’re unveiling reasoning, vision-language, retrieval-augmented generation (RAG), and safety models with open data and recipes that deliver accuracy, compute efficiency, and openness.

This blog covers the features, performance, and tutorials on using the new Nemotron models for building multimodal agents, RAG pipelines, and AI with content safety. 

Figure 1. New Nemotron models for document intelligence, video understanding, multilingual content safety, and information retrieval

Enable agents to think efficiently with NVIDIA Nemotron Nano 3

The NVIDIA Nemotron Nano 3 is an efficient and accurate 32B parameter MoE with 3.6B active parameters designed for developers to build specialized agentic AI systems. Available soon, this model delivers higher throughput compared to similarly-sized dense models, enabling it to explore a larger search space, do better self-reflection, and provide higher accuracy across scientific reasoning, coding, math, and tool-calling benchmarks. Additionally, the MoE architecture reduces compute costs and latency.

Add multimodal understanding and reasoning with NVIDIA Nemotron Nano 2 VL

NVIDIA Nemotron Nano 2 VL, a leading model on OCRBenchV2, is an open 12B multimodal reasoning model for document intelligence and video understanding. It enables AI assistants to extract, interpret, and act on information across text, images, tables, and videos. This makes the model valuable for agents focused on data analysis, document processing, and visual understanding in applications like generating reports, curating videos, and dense captioning for media asset management and retrieval-augmented search. 

Video 1. Building multimodal AI agents for document and video intelligence using NVIDIA Nemotron VLMs

At its core, this vision-language model (VLM) features a hybrid Mamba-Transfomer architecture delivering on-par accuracy, high token throughput, and low latency for efficient large-scale reasoning for visual and text tasks. This model is trained on the Nemotron VLM Dataset V2 with over 11M high-quality samples covering several tasks such as image Q&A, OCR, dense captioning, video Q&A, and multi-image reasoning. We used FP8 for faster speed and context parallelism to manage longer inputs, leading to greater efficiency and accuracy for video and long-document tasks.

This model introduces the Efficient Video Sampling (EVS) method that identifies and prunes temporally static patches in video sequences. EVS reduces token redundancy, preserving essential semantics, for the model to process longer clips and deliver results more swiftly.

Figure 3. EVS enables Nemotron Nano 2 VL to achieve up to 2.5x higher throughput without sacrificing accuracy

Quantized for FP4, FP8, and BF16, this model is supported by vLLM and TRT-LLM inference engines and is available as an NVIDIA NIM. Developers can use the NVIDIA AI Blueprint for video search and summarization (VSS) to analyze long videos and NVIDIA NeMo to curate multimodal datasets and customize or build their own models. The technical report also guides developers on the models for building custom, optimized models with Nemotron techniques.

Improve document intelligence with NVIDIA Nemotron Parse 1.1

We’re also releasing NVIDIA Nemotron Parse 1.1, a compact 1B parameter VLM-based document parser for enhanced document intelligence. Given an image, this model extracts structured text and tables with bounding boxes and semantic classes, enabling downstream applications such as improved retriever accuracy, richer large language model (LLM) training data, and improved document processing pipelines.

Figure 4. Nemotron Parse 1.1 delivers leading accuracy on the PubTabNet benchmark for image-based table recognition

Nemotron Parse delivers comprehensive text, tables, and layout understanding for use in retriever and curator workflows. Its extraction datasets and structured outputs support both LLM and VLM training, and boost inference accuracy for VLMs at runtime.

Ground agents with open RAG models

NVIDIA Nemotron RAG is a suite of models for building RAG pipelines and real-time business insights. It ensures data privacy and connects securely to proprietary data across environments, supporting enterprise-grade retrieval. As a core component of NVIDIA AI-Q and the NVIDIA RAG Blueprint, Nemotron RAG provides a scalable and production-ready foundation for intelligent, retrieval-based AI applications.

It enables the development of a wide range of applications—from multi-agent systems where AI agents perceive, plan, and act to achieve complex goals, to generative co-pilots powered by specialized large language models that assist with IT support, HR operations, and customer service. It also supports AI assistants that interact naturally with developers using company data and summarization tools that create written reports or visual media highlights.

The embedding models have consistently led on industry leaderboards like ViDoRe and MTEB for visual and multimodal retrieval, MMTEB for multilingual text retrieval, making them well-suited for building best-in-class RAG pipelines. The new models are now available on Hugging Face.

Video 2. Developing custom AI agents powered with information retrieval using NVIDIA Nemotron RAG

Make AI safer with the Llama 3.1 Nemotron Safety Guard

As developers build agentic AI systems that can reason, retrieve, and act autonomously, safety becomes essential to prevent harmful or unintended behavior. LLMs can be misused, prompted into unsafe outputs, or miss cultural nuance—especially in non-English contexts—making reliable moderation models critical to responsible development.

The new Llama 3.1 Nemotron Safety Guard 8B V3 is a multilingual content safety model. It’s fine-tuned on the Nemotron Safety Guard dataset, a culturally diverse dataset with more than 386K samples covering 23 regionally adapted safety categories, including examples of adversarial and jailbreak prompts within each category.

The model detects unsafe or policy-violating content in both prompts and responses across 23 safety categories and nine languages, such as Arabic, Hindi, and Japanese. Figure 4 illustrates our model’s performance comparison on a per-language basis. 

Figure 5. A comparison of the Llama 3.1 Nemotron Safety Guard model performance across languages

The model achieves 84.2% harmful content classification accuracy with minimal latency, as seen in Figure 5. Two novel techniques power its performance: 1) LLM-driven cultural adaptation aligns prompts and responses with local idioms and sensitivities, and 2) consistency filtering removes noisy or misaligned samples for high-quality fine-tuning.

Figure 6. In benchmark testing across eight datasets, the Llama 3.1 Nemotron Safety Guard model delivers best-in-class performance across 23 safety categories

Lightweight and deployable on a single GPU or as an NVIDIA NIM, it integrates with NeMo Guardrails for real-time, multilingual content safety in agentic AI pipelines. Explore the model and dataset on HuggingFace or build.nvidia.com to start building safer, globally aligned AI systems.

Video 3. Power AI with culturally-aware LLM guardrails using Nemotron Safety Guard

Evaluate your models and optimize AI agents with NVIDIA NeMo

To ensure LLM capabilities are measured reliably, the NVIDIA NeMo Evaluator SDK was recently open sourced. This SDK enables reproducible benchmarking, giving developers confidence in real-world performance beyond reported scores. 

NeMo Evaluator can now also assess models on dynamic, interactive workflows with support for ProfBench, a benchmark suite designed to evaluate agentic AI behaviors, including multi-step reasoning and tool usage. 

By open-sourcing standardized evaluation setups, developers can benchmark performance, validate outputs, and compare models under consistent conditions. 

NeMo Agent Toolkit is an open-source framework integrated with industry standards like MCP and compatible with other frameworks, including Semantic Kernel, Google ADK, LangChain, and CrewAI. The toolkit’s new Agent Optimizer feature automatically tunes key hyperparameters—LLM type, temperature, max tokens—and optimizes for accuracy, groundedness, latency, token usage, and custom metrics. This reduces trial-and-error and accelerates agent, tool, and workflow development. 

See Agent Optimizer in action and try it now with our GitHub notebook.

Start building your AI with Nemotron now 

In this blog post, we’ve introduced the newest members of the Nemotron family and a small sample of what is possible with them.

To get started, download the Nemotron models and datasets from Hugging Face. 

Nemotron Nano 2 VL is also hosted by inference providers including Baseten, Deep Infra, Fireworks, Hyperbolic, Nebius, and Replicate to provide  an efficient path from development to production for agentic AI.

You can also evaluate the NVIDIA-hosted API endpoints on build.nvidia.com and OpenRouter.

Stay up to date on NVIDIA Nemotron by subscribing to NVIDIA news and following NVIDIA AI on LinkedIn, X, Discord, and YouTube.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

NVIDIA Nemotron Agentic AI AI Models Multimodal AI RAG AI Safety NVIDIA GTC DC
相关文章