cs.AI updates on arXiv.org 10月23日 12:14
多阶段视觉语言问答系统助力自动驾驶
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出了一种针对自动驾驶的高级感知、预测和规划问题的两阶段视觉语言问答系统。通过在第一阶段使用大型的多模态语言模型(Qwen2.5-VL-32B)和自洽性集成(多个样本推理链)来提高答案可靠性;在第二阶段通过添加nuScenes场景元数据及特定类别问题指令来进一步优化问答效果。实验结果表明,该方法在自动驾驶问答基准测试中显著优于基线模型,并在视觉污染情况下保持高准确率。

arXiv:2510.19001v1 Announce Type: cross Abstract: We present a two-phase vision-language QA system for autonomous driving that answers high-level perception, prediction, and planning questions. In Phase-1, a large multimodal LLM (Qwen2.5-VL-32B) is conditioned on six-camera inputs, a short temporal window of history, and a chain-of-thought prompt with few-shot exemplars. A self-consistency ensemble (multiple sampled reasoning chains) further improves answer reliability. In Phase-2, we augment the prompt with nuScenes scene metadata (object annotations, ego-vehicle state, etc.) and category-specific question instructions (separate prompts for perception, prediction, planning tasks). In experiments on a driving QA benchmark, our approach significantly outperforms the baseline Qwen2.5 models. For example, using 5 history frames and 10-shot prompting in Phase-1 yields 65.1% overall accuracy (vs.62.61% with zero-shot); applying self-consistency raises this to 66.85%. Phase-2 achieves 67.37% overall. Notably, the system maintains 96% accuracy under severe visual corruption. These results demonstrate that carefully engineered prompts and contextual grounding can greatly enhance high-level driving QA with pretrained vision-language models.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

视觉语言问答 自动驾驶 多阶段系统 高准确率
相关文章