Nvidia Developer 10月29日 03:27
NVIDIA推动自动驾驶数据处理与仿真技术革新
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

自动驾驶汽车(AV)的开发正从分层架构转向基于基础模型的端到端架构。这一转变需要强大的数据处理能力,以生成合成数据,增强传感器数据集,弥补数据覆盖空白,并最终构建安全可靠的自动驾驶汽车开发与部署验证工具链。NVIDIA通过Omniverse和Cosmos工作流,提供了创新的解决方案,包括大规模数据集处理、推理视觉-语言-行为(VLA)模型所需的驱动数据,以及利用Cosmos Reason进行数据筛选、标注和去重。此外,NVIDIA NuRec技术利用神经重建和渲染,将真实世界数据转化为高保真度的数字孪生,用于模拟和生成新的传感器数据。Cosmos Predict和Cosmos Transfer等世界模型进一步丰富了仿真数据的多样性,支持生成不同天气、光照和地形条件下的场景。这些技术已集成到CARLA和Voxel51 FiftyOne等开源及企业级工具链中,极大地加速了自动驾驶汽车的开发和验证过程。

🚗 **数据驱动的自动驾驶演进**:自动驾驶技术正从传统的离散模块化架构向基于基础模型的端到端架构演进,这要求一个高效的数据“飞轮”来生成合成数据、扩充传感器数据集,从而弥补数据覆盖的不足,并最终构建一个能够安全开发和部署自动驾驶汽车的验证工具链。

📊 **大规模数据处理与训练**:为支持自动驾驶堆栈中复杂的推理视觉-语言-行为(VLA)模型,需要海量的驾驶数据进行预训练和后训练。NVIDIA发布了全球最大的多模态自动驾驶数据集之一,包含超过1700小时的城市驾驶数据,覆盖多种交通密度、天气条件和地理区域,为模型训练提供了坚实基础。Cosmos Reason等工具则能快速筛选、标注和去重海量传感器数据,并利用Cosmos Dataset Search(CDS)快速创建针对特定场景的数据集。

✨ **高保真度仿真与数据多样化**:通过NVIDIA Omniverse NuRec技术,开发者能将真实世界数据转化为高保真度的数字孪生,进行交互式模拟。NuRec能够重建传感器数据为3D表示,并进行高品质渲染,即使在存在重建瑕疵时,NuRec Fixer也能进行修复。同时,Cosmos Predict和Cosmos Transfer等世界模型能生成多样化的新视频状态,模拟不同天气、光照和地形,极大地丰富了仿真数据的变化性。

🔗 **工具链集成与加速开发**:NVIDIA的先进模型和工作流已集成到CARLA等开源模拟器以及Voxel51 FiftyOne等数据引擎中,方便开发者快速采纳并融入现有模拟流水线。例如,CARLA集成了NuRec渲染API和Cosmos Transfer世界模型,能够生成具有光线追踪的传感器数据,并利用Cosmos WFMs增强多样性。Voxel51 FiftyOne则整合了CDS、NuRec和Cosmos Transfer,用于创建高质量、可用于仿真的数据集,覆盖数据筛选、重建和风格迁移等多个环节,全面提升了自动驾驶数据处理和仿真流水线的效率。

Autonomous vehicle (AV) stacks are evolving from a hierarchy of discrete building blocks to end-to-end architectures built on foundation models. This transition demands an AV data flywheel to generate synthetic data and augment sensor datasets, address coverage gaps and, and ultimately, build a validation toolchain to safely develop and deploy autonomous vehicles. 

In this blog post, we highlight the latest NVIDIA Omniverse and NVIDIA Cosmos workflows, models and datasets for developers to kickstart data pipelines.

Specifically, this post will cover:

Processing massive datasets for AV testing and validation

The reasoning vision-language-action (VLA) models that power the AV stack require massive amounts of driving data for both pre-training and post-training. As the stack becomes more mature, the data must become more targeted to address edge cases or weaknesses.

Data collected in the real world is at the core of these training and post-training datasets. To help kickstart development, NVIDIA has released one of the world’s largest multi-modal AV datasets, featuring over 1,700 hours—consisting of 20 second clips—of camera, radar, and lidar data covering urban driving scenarios across more than 2,500 cities and 25 countries. The scenes cover a variety of traffic density, weather conditions, time of day in addition to infrastructure elements such as tunnels, bridges, roundabouts, railway crossings, toll booths, inclines and more. 

This data, which was used to develop NVIDIA’s internal reasoning VLA for autonomous driving, can be used for training and post-training, as well as scaled into larger datasets using synthetic data generation workflows described in this blog.

Once data has been collected and generated, it must be processed into useful clips. Leveraging Cosmos Reason, an open, reasoning vision-language model (VLM), Cosmos Curator quickly filters, annotates, and deduplicates large amounts of sensor data. Cosmos Reason is available as an NVIDIA NIM which offers secure, easy-to-use microservices for deploying high-performance generative AI across any environment.
You can then create datasets containing targeted ego or actor behavior for specific post-training tasks—such as left turns at busy intersections—in a matter of seconds using Cosmos Dataset Search (CDS), a GPU-accelerated vector search workflow that quickly embeds and searches video datasets.

You can then create datasets containing targeted ego or actor behavior for specific post-training tasks—such as left turns at busy intersections—in a matter of seconds using Cosmos Dataset Search (CDS), a GPU-accelerated vector search workflow that quickly embeds and searches video datasets.

Figure 1. Cosmos Dataset Search instantly retrieves scenarios based on text, image, or video prompts to create targeted datasets for post-training.

Neural reconstruction for AV simulation

Using advanced 3D reconstruction techniques, neural reconstruction and rendering, developers can turn real world datasets into interactive, high-fidelity simulation.

NVIDIA Omniverse NuRec

NVIDIA Omniverse NuRec is a set of technologies for neural reconstruction and rendering. It enables developers to use their existing fleet data to reconstruct high-fidelity digital twins, simulate new events, and render sensor data from novel points of view. NuRec’s libraries, models, and tools  enable developers to:

    Prepare and process sensor data for reconstructionReconstruct sensor data into 3D representationsPerform Gaussian-based rendering

NuRec also includes generative AI models to enhance the quality of reconstructions for more robust simulation. NuRec Fixer is a transformer-based model post-trained on AV datasets to inpaint and resolve reconstruction artifacts. Developers can run Fixer during reconstruction or as a post-process during neural rendering to fix such artifacts. Fixer is based on the Difix3D+ paper released at CVPR 2025. With Fixer, novel view synthesis from reconstructed scenes becomes practical for open and closed-loop simulation workflows.

Video 1. NVIDIA NuRec Fixer addresses artifacts in reconstruction for higher quality sensor simulation from real world drives

Diversify with world models

We can further scale data and amplify variation in simulation using world models that allow intelligent systems to simulate, predict, and interact with their environments.

NVIDIA Cosmos Predict and Transfer

Cosmos Predict world foundation model generates new video states using text, images, or video as input for robotics and autonomous vehicle simulation. 

Cosmos Transfer is a multi-control net model built on Cosmos Predict that produces high-quality world simulations conditioned on spatial control inputs that feed details like road layout and object position and orientation. Users can prompt Cosmos Transfer to generate diverse weather, lighting and terrain variations for any given scene. 

The latest model releases—Cosmos Predict 2.5 and Cosmos Transfer 2.5—can generate up to 30 seconds of new video with camera controllable multi-view outputs and better adherence to control signals to meet AV simulation needs.

Dive deeper with the Cosmos white paper for technical insights, and jumpstart your journey with the Cosmos Cookbook—a guide for building, customizing, and deploying Cosmos for autonomous systems for your own use cases.

Integrating neural reconstruction and world models into simulation pipelines

These models and workflows have been integrated into open-source and enterprise toolchains for easy adoption in existing simulation pipelines.

CARLA open source AV simulator

CARLA is one of the world’s most popular open source simulation platforms with more than 150,000 active developers, serving as a testbed for AV research and development. NVIDIA is partnering with CARLA to integrate the latest NuRec rendering APIs and Cosmos Transfer world foundation model. This enables developers to generate sensor data from Gaussian representations with ray tracing and amplify diversity with Cosmos WFMs.

Below is an example of a scene where CARLA is orchestrating the motion of all agents, including the ego-vehicle, and rendering sensor data from the ego point of view using NuRec. By adding reconstructed scenes and simulating new events with CARLA’s APIs and traffic model integrations, we can create useful corner-case datasets.

Cosmos Transfer integration with CARLA can then create variations of this scene for both training and testing, as can be seen below:

Video 2. Replay of a 3DGUT reconstructed drive in CARLA using NVIDIA NuRec

Novel view generation with NVIDIA Omniverse NuRec and Cosmos Transfer

When rendering a reconstructed scene from a novel view, there can be gaps in the reconstruction, which could lead to artifacts.

Developers can try out this pipeline using over 900 reconstructed scenes available on the NVIDIA Physical AI Open Datasets. With this latest version of CARLA, developers can now author completely new trajectories, reposition the camera, and simulate drives with this starter pack of reconstructed data.

Video 3. Upper Right: Replay of a 3DGUT reconstructed drive in CARLA using NuRec. Below clockwise from left: variants of the reconstructed drive generated by Cosmos Transfer, including snowy, evening,  clear weather with ivy on buildings, sunset with glare

CARLA developers using behavioural directable agent models like Imagining The Road Ahead (ITRA) from Inverted.AI, and AV developers using the Foretellix Foretify data-automation toolchain, pre-integrated with CARLA and NVIDIA Cosmos,  can generate realistic variations in scenarios and behaviors and scale up behavioral diversity.

Video 4. Generation of sensor data with the Cosmos-Transfer1-7B-Sample-AV [HDMap] model conditioned on text prompts and object level simulation from Foretellix with physics from CARLA
Video 5. Large-scale generation of AV sensor data with Cosmos Transfer conditioned on text prompts and outputs from CARLA and Inverted AI

Voxel51 AV Simulation Data Pipeline

FiftyOne, from Voxel51, is a visual and multimodal AI data engine that enables physical AI developers to curate, annotate, and evaluate large datasets and models for training and testing. It integrates Cosmos Dataset Search (CDS), NuRec and Cosmos Transfer to create high-quality, simulation-ready datasets, enhancing each stage of the simulation pipeline:

    CDS allows users to perform fast, high-recall semantic searches over petabyte-scale  video data  to create targeted datasets for various downstream needs.NuRec integration allows users to convert raw data streams to validated datasets in NuRec format and reconstruct scenes. Developers can ingest their datasets, evaluate the quality of their reconstructions, and create 3D digital twins for downstream simulation tasks. Cosmos Transfer integration enables users to directly apply style transfer to their data, increasing their datasets’ diversity. 

Stop by the Voxel51 booth (#411) at GTC DC to explore the workflow. See a first-ever demo of this end-to-end data pipeline at the launch webinar on Nov 5, 2025 at 9 AM PT.

Video 6. Replay of a real world drive from the Waymo dataset with NVIDIA NuRec in Voxel51.

Get started developing today

Stay up-to-date by subscribing to NVIDIA news and following NVIDIA Omniverse on Discord and YouTube.

Get started with developer starter kits to quickly develop and enhance your own applications and services.

Join us for Physical AI and Robotics Day at NVIDIA GTC Washington, D.C. on October 29, 2025 as we bring together developers, researchers, and technology leaders to learn how NVIDIA technologies are accelerating the next era of AI.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

自动驾驶 NVIDIA Omniverse Cosmos 数据处理 仿真 AI 机器学习 Autonomous Driving NVIDIA Omniverse Cosmos Data Processing Simulation AI Machine Learning
相关文章