NVIDIA推出Nemotron系列新模型，赋能智能代理AI

Agentic AI is an ecosystem where specialized language and vision models work together. They handle planning, reasoning, retrieval, and safety guardrailing.

Developers need specialized AI agents for domain-specific workflows, real-world deployment, and compliance. Building specialized AI requires four critical ingredients: open models that can be fine-tuned, robust datasets, recipes for optimum model accuracy and compute, and efficient inference for deploying them at scale.

At NVIDIA GTC DC, we’re unveiling reasoning, vision-language, retrieval-augmented generation (RAG), and safety models with open data and recipes that deliver accuracy, compute efficiency, and openness.

This blog covers the features, performance, and tutorials on using the new Nemotron models for building multimodal agents, RAG pipelines, and AI with content safety.

*Figure 1. New Nemotron models for document intelligence, video understanding, multilingual content safety, and information retrieval*

Enable agents to think efficiently with NVIDIA Nemotron Nano 3

The NVIDIA Nemotron Nano 3 is an efficient and accurate 32B parameter MoE with 3.6B active parameters designed for developers to build specialized agentic AI systems. Available soon, this model delivers higher throughput compared to similarly-sized dense models, enabling it to explore a larger search space, do better self-reflection, and provide higher accuracy across scientific reasoning, coding, math, and tool-calling benchmarks. Additionally, the MoE architecture reduces compute costs and latency.

Add multimodal understanding and reasoning with NVIDIA Nemotron Nano 2 VL

NVIDIA Nemotron Nano 2 VL, a leading model on OCRBenchV2, is an open 12B multimodal reasoning model for document intelligence and video understanding. It enables AI assistants to extract, interpret, and act on information across text, images, tables, and videos. This makes the model valuable for agents focused on data analysis, document processing, and visual understanding in applications like generating reports, curating videos, and dense captioning for media asset management and retrieval-augmented search.

Video 1. Building multimodal AI agents for document and video intelligence using NVIDIA Nemotron VLMs

At its core, this vision-language model (VLM) features a hybrid Mamba-Transfomer architecture delivering on-par accuracy, high token throughput, and low latency for efficient large-scale reasoning for visual and text tasks. This model is trained on the Nemotron VLM Dataset V2 with over 11M high-quality samples covering several tasks such as image Q&A, OCR, dense captioning, video Q&A, and multi-image reasoning. We used FP8 for faster speed and context parallelism to manage longer inputs, leading to greater efficiency and accuracy for video and long-document tasks.

*Figure 3. EVS enables Nemotron Nano 2 VL to achieve up to 2.5x higher throughput without sacrificing accuracy*

Quantized for FP4, FP8, and BF16, this model is supported by vLLM and TRT-LLM inference engines and is available as an NVIDIA NIM. Developers can use the NVIDIA AI Blueprint for video search and summarization (VSS) to analyze long videos and NVIDIA NeMo to curate multimodal datasets and customize or build their own models. The technical report also guides developers on the models for building custom, optimized models with Nemotron techniques.

Improve document intelligence with NVIDIA Nemotron Parse 1.1

We’re also releasing NVIDIA Nemotron Parse 1.1, a compact 1B parameter VLM-based document parser for enhanced document intelligence. Given an image, this model extracts structured text and tables with bounding boxes and semantic classes, enabling downstream applications such as improved retriever accuracy, richer large language model (LLM) training data, and improved document processing pipelines.

Nemotron Parse delivers comprehensive text, tables, and layout understanding for use in retriever and curator workflows. Its extraction datasets and structured outputs support both LLM and VLM training, and boost inference accuracy for VLMs at runtime.

Ground agents with open RAG models

NVIDIA Nemotron RAG is a suite of models for building RAG pipelines and real-time business insights. It ensures data privacy and connects securely to proprietary data across environments, supporting enterprise-grade retrieval. As a core component of NVIDIA AI-Q and the NVIDIA RAG Blueprint, Nemotron RAG provides a scalable and production-ready foundation for intelligent, retrieval-based AI applications.

It enables the development of a wide range of applications—from multi-agent systems where AI agents perceive, plan, and act to achieve complex goals, to generative co-pilots powered by specialized large language models that assist with IT support, HR operations, and customer service. It also supports AI assistants that interact naturally with developers using company data and summarization tools that create written reports or visual media highlights.

The embedding models have consistently led on industry leaderboards like ViDoRe and MTEB for visual and multimodal retrieval, MMTEB for multilingual text retrieval, making them well-suited for building best-in-class RAG pipelines. The new models are now available on Hugging Face.

Make AI safer with the Llama 3.1 Nemotron Safety Guard

As developers build agentic AI systems that can reason, retrieve, and act autonomously, safety becomes essential to prevent harmful or unintended behavior. LLMs can be misused, prompted into unsafe outputs, or miss cultural nuance—especially in non-English contexts—making reliable moderation models critical to responsible development.

The new Llama 3.1 Nemotron Safety Guard 8B V3 is a multilingual content safety model. It’s fine-tuned on the Nemotron Safety Guard dataset, a culturally diverse dataset with more than 386K samples covering 23 regionally adapted safety categories, including examples of adversarial and jailbreak prompts within each category.

The model detects unsafe or policy-violating content in both prompts and responses across 23 safety categories and nine languages, such as Arabic, Hindi, and Japanese. Figure 4 illustrates our model’s performance comparison on a per-language basis.

The model achieves 84.2% harmful content classification accuracy with minimal latency, as seen in Figure 5. Two novel techniques power its performance: 1) LLM-driven cultural adaptation aligns prompts and responses with local idioms and sensitivities, and 2) consistency filtering removes noisy or misaligned samples for high-quality fine-tuning.

Lightweight and deployable on a single GPU or as an NVIDIA NIM, it integrates with NeMo Guardrails for real-time, multilingual content safety in agentic AI pipelines. Explore the model and dataset on HuggingFace or build.nvidia.com to start building safer, globally aligned AI systems.

Evaluate your models and optimize AI agents with NVIDIA NeMo

To ensure LLM capabilities are measured reliably, the NVIDIA NeMo Evaluator SDK was recently open sourced. This SDK enables reproducible benchmarking, giving developers confidence in real-world performance beyond reported scores.

NeMo Evaluator can now also assess models on dynamic, interactive workflows with support for ProfBench, a benchmark suite designed to evaluate agentic AI behaviors, including multi-step reasoning and tool usage.

By open-sourcing standardized evaluation setups, developers can benchmark performance, validate outputs, and compare models under consistent conditions.

NeMo Agent Toolkit is an open-source framework integrated with industry standards like MCP and compatible with other frameworks, including Semantic Kernel, Google ADK, LangChain, and CrewAI. The toolkit’s new Agent Optimizer feature automatically tunes key hyperparameters—LLM type, temperature, max tokens—and optimizes for accuracy, groundedness, latency, token usage, and custom metrics. This reduces trial-and-error and accelerates agent, tool, and workflow development.

Start building your AI with Nemotron now

In this blog post, we’ve introduced the newest members of the Nemotron family and a small sample of what is possible with them.

Nemotron Nano 2 VL is also hosted by inference providers including Baseten, Deep Infra, Fireworks, Hyperbolic, Nebius, and Replicate to provide an efficient path from development to production for agentic AI.

Enable agents to think efficiently with NVIDIA Nemotron Nano 3

Add multimodal understanding and reasoning with NVIDIA Nemotron Nano 2 VL

Improve document intelligence with NVIDIA Nemotron Parse 1.1

Ground agents with open RAG models

Make AI safer with the Llama 3.1 Nemotron Safety Guard

Evaluate your models and optimize AI agents with NVIDIA NeMo

Start building your AI with Nemotron now

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签