多阶段视觉语言问答系统助力自动驾驶

cs.AI updates on arXiv.org 10月23日 12:14

多阶段视觉语言问答系统助力自动驾驶

本文提出了一种针对自动驾驶的高级感知、预测和规划问题的两阶段视觉语言问答系统。通过在第一阶段使用大型的多模态语言模型（Qwen2.5-VL-32B）和自洽性集成（多个样本推理链）来提高答案可靠性；在第二阶段通过添加nuScenes场景元数据及特定类别问题指令来进一步优化问答效果。实验结果表明，该方法在自动驾驶问答基准测试中显著优于基线模型，并在视觉污染情况下保持高准确率。

arXiv:2510.19001v1 Announce Type: cross Abstract: We present a two-phase vision-language QA system for autonomous driving that answers high-level perception, prediction, and planning questions. In Phase-1, a large multimodal LLM (Qwen2.5-VL-32B) is conditioned on six-camera inputs, a short temporal window of history, and a chain-of-thought prompt with few-shot exemplars. A self-consistency ensemble (multiple sampled reasoning chains) further improves answer reliability. In Phase-2, we augment the prompt with nuScenes scene metadata (object annotations, ego-vehicle state, etc.) and category-specific question instructions (separate prompts for perception, prediction, planning tasks). In experiments on a driving QA benchmark, our approach significantly outperforms the baseline Qwen2.5 models. For example, using 5 history frames and 10-shot prompting in Phase-1 yields 65.1% overall accuracy (vs.62.61% with zero-shot); applying self-consistency raises this to 66.85%. Phase-2 achieves 67.37% overall. Notably, the system maintains 96% accuracy under severe visual corruption. These results demonstrate that carefully engineered prompts and contextual grounding can greatly enhance high-level driving QA with pretrained vision-language models.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

视觉语言问答自动驾驶多阶段系统高准确率

相关文章

Tesla Announces Reduction in Subscription Fee of its FSD Driver-Assist Software

New Arm Processors Boost Security for AI-enabled SDVs

More Language, Less Labeling with Kate Saenko - #580

Bringing AI Up to Speed with Autonomous Racing w/ Madhur Behl - #494

System Design for Autonomous Vehicles with Drago Anguelov - #454

Simulating the Future of Traffic with RL w/ Cathy Wu - #362

The Next Generation of Self-Driving Engineers with Aaron Ma - Talk #318

Perception Models for Self-Driving Cars with Jianxiong Xiao - TWiML Talk #58

This Week in ML & AI - 7/1/16: Fatal Tesla Autopilot Crash, EU Outlawing Machine Learning & CVPR

误把广告牌当真车，理想L9高速事故最新回应