NVIDIA推动自动驾驶数据处理与仿真技术革新

Autonomous vehicle (AV) stacks are evolving from a hierarchy of discrete building blocks to end-to-end architectures built on foundation models. This transition demands an AV data flywheel to generate synthetic data and augment sensor datasets, address coverage gaps and, and ultimately, build a validation toolchain to safely develop and deploy autonomous vehicles.

In this blog post, we highlight the latest NVIDIA Omniverse and NVIDIA Cosmos workflows, models and datasets for developers to kickstart data pipelines.

Specifically, this post will cover:

Processing massive datasets for AV testing and validation

The reasoning vision-language-action (VLA) models that power the AV stack require massive amounts of driving data for both pre-training and post-training. As the stack becomes more mature, the data must become more targeted to address edge cases or weaknesses.

Data collected in the real world is at the core of these training and post-training datasets. To help kickstart development, NVIDIA has released one of the world’s largest multi-modal AV datasets, featuring over 1,700 hours—consisting of 20 second clips—of camera, radar, and lidar data covering urban driving scenarios across more than 2,500 cities and 25 countries. The scenes cover a variety of traffic density, weather conditions, time of day in addition to infrastructure elements such as tunnels, bridges, roundabouts, railway crossings, toll booths, inclines and more.

This data, which was used to develop NVIDIA’s internal reasoning VLA for autonomous driving, can be used for training and post-training, as well as scaled into larger datasets using synthetic data generation workflows described in this blog.

Once data has been collected and generated, it must be processed into useful clips. Leveraging Cosmos Reason , an open, reasoning vision-language model (VLM), Cosmos Curator quickly filters, annotates, and deduplicates large amounts of sensor data. Cosmos Reason is available as an NVIDIA NIM which offers secure, easy-to-use microservices for deploying high-performance generative AI across any environment.
You can then create datasets containing targeted ego or actor behavior for specific post-training tasks—such as left turns at busy intersections—in a matter of seconds using Cosmos Dataset Search (CDS), a GPU-accelerated vector search workflow that quickly embeds and searches video datasets.

You can then create datasets containing targeted ego or actor behavior for specific post-training tasks—such as left turns at busy intersections—in a matter of seconds using Cosmos Dataset Search (CDS), a GPU-accelerated vector search workflow that quickly embeds and searches video datasets.

*Figure 1. Cosmos Dataset Search instantly retrieves scenarios based on text, image, or video prompts to create targeted datasets for post-training.*

Neural reconstruction for AV simulation

Using advanced 3D reconstruction techniques, neural reconstruction and rendering, developers can turn real world datasets into interactive, high-fidelity simulation.

NVIDIA Omniverse NuRec

NVIDIA Omniverse NuRec is a set of technologies for neural reconstruction and rendering. It enables developers to use their existing fleet data to reconstruct high-fidelity digital twins, simulate new events, and render sensor data from novel points of view. NuRec’s libraries, models, and tools enable developers to:

Prepare and process sensor data for reconstructionReconstruct sensor data into 3D representationsPerform Gaussian-based rendering

NuRec also includes generative AI models to enhance the quality of reconstructions for more robust simulation. NuRec Fixer is a transformer-based model post-trained on AV datasets to inpaint and resolve reconstruction artifacts. Developers can run Fixer during reconstruction or as a post-process during neural rendering to fix such artifacts. Fixer is based on the Difix3D+ paper released at CVPR 2025. With Fixer, novel view synthesis from reconstructed scenes becomes practical for open and closed-loop simulation workflows.

Video 1. NVIDIA NuRec Fixer addresses artifacts in reconstruction for higher quality sensor simulation from real world drives

Diversify with world models

We can further scale data and amplify variation in simulation using world models that allow intelligent systems to simulate, predict, and interact with their environments.

NVIDIA Cosmos Predict and Transfer

Cosmos Predict world foundation model generates new video states using text, images, or video as input for robotics and autonomous vehicle simulation.

Cosmos Transfer is a multi-control net model built on Cosmos Predict that produces high-quality world simulations conditioned on spatial control inputs that feed details like road layout and object position and orientation. Users can prompt Cosmos Transfer to generate diverse weather, lighting and terrain variations for any given scene.

The latest model releases—Cosmos Predict 2.5 and Cosmos Transfer 2.5—can generate up to 30 seconds of new video with camera controllable multi-view outputs and better adherence to control signals to meet AV simulation needs.

Dive deeper with the Cosmos white paper for technical insights, and jumpstart your journey with the Cosmos Cookbook—a guide for building, customizing, and deploying Cosmos for autonomous systems for your own use cases.

Integrating neural reconstruction and world models into simulation pipelines

These models and workflows have been integrated into open-source and enterprise toolchains for easy adoption in existing simulation pipelines.

CARLA open source AV simulator

CARLA is one of the world’s most popular open source simulation platforms with more than 150,000 active developers, serving as a testbed for AV research and development. NVIDIA is partnering with CARLA to integrate the latest NuRec rendering APIs and Cosmos Transfer world foundation model. This enables developers to generate sensor data from Gaussian representations with ray tracing and amplify diversity with Cosmos WFMs.

Below is an example of a scene where CARLA is orchestrating the motion of all agents, including the ego-vehicle, and rendering sensor data from the ego point of view using NuRec. By adding reconstructed scenes and simulating new events with CARLA’s APIs and traffic model integrations, we can create useful corner-case datasets.

Cosmos Transfer integration with CARLA can then create variations of this scene for both training and testing, as can be seen below:

Video 2. Replay of a 3DGUT reconstructed drive in CARLA using NVIDIA NuRec

Novel view generation with NVIDIA Omniverse NuRec and Cosmos Transfer

When rendering a reconstructed scene from a novel view, there can be gaps in the reconstruction, which could lead to artifacts.

Developers can try out this pipeline using over 900 reconstructed scenes available on the NVIDIA Physical AI Open Datasets. With this latest version of CARLA, developers can now author completely new trajectories, reposition the camera, and simulate drives with this starter pack of reconstructed data.

Video 3. Upper Right: Replay of a 3DGUT reconstructed drive in CARLA using NuRec. Below clockwise from left: variants of the reconstructed drive generated by Cosmos Transfer, including snowy, evening, clear weather with ivy on buildings, sunset with glare

CARLA developers using behavioural directable agent models like Imagining The Road Ahead (ITRA) from Inverted.AI, and AV developers using the Foretellix Foretify data-automation toolchain, pre-integrated with CARLA and NVIDIA Cosmos, can generate realistic variations in scenarios and behaviors and scale up behavioral diversity.

Video 4. Generation of sensor data with the Cosmos-Transfer1-7B-Sample-AV [HDMap] model conditioned on text prompts and object level simulation from Foretellix with physics from CARLA

Video 5. Large-scale generation of AV sensor data with Cosmos Transfer conditioned on text prompts and outputs from CARLA and Inverted AI

Voxel51 AV Simulation Data Pipeline

FiftyOne, from Voxel51, is a visual and multimodal AI data engine that enables physical AI developers to curate, annotate, and evaluate large datasets and models for training and testing. It integrates Cosmos Dataset Search (CDS), NuRec and Cosmos Transfer to create high-quality, simulation-ready datasets, enhancing each stage of the simulation pipeline:

CDS allows users to perform fast, high-recall semantic searches over petabyte-scale video data to create targeted datasets for various downstream needs.NuRec integration allows users to convert raw data streams to validated datasets in NuRec format and reconstruct scenes. Developers can ingest their datasets, evaluate the quality of their reconstructions, and create 3D digital twins for downstream simulation tasks. Cosmos Transfer integration enables users to directly apply style transfer to their data, increasing their datasets’ diversity.

Stop by the Voxel51 booth (#411) at GTC DC to explore the workflow. See a first-ever demo of this end-to-end data pipeline at the launch webinar on Nov 5, 2025 at 9 AM PT.

Video 6. Replay of a real world drive from the Waymo dataset with NVIDIA NuRec in Voxel51.

Get started developing today

Stay up-to-date by subscribing to NVIDIA news and following NVIDIA Omniverse on Discord and YouTube.

Get started with developer starter kits to quickly develop and enhance your own applications and services.

Join us for Physical AI and Robotics Day at NVIDIA GTC Washington, D.C. on October 29, 2025 as we bring together developers, researchers, and technology leaders to learn how NVIDIA technologies are accelerating the next era of AI.