cs.AI updates on arXiv.org 10月20日 12:08
VERITAS:提升SFT数据质量的多模态模型优化
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出一种名为VERITAS的优化流程,通过整合视觉先验和多种最先进的LMMs以及统计方法,以提升监督微调数据的质量,并通过实验验证其有效性。

arXiv:2510.15317v1 Announce Type: new Abstract: The quality of supervised fine-tuning (SFT) data is crucial for the performance of large multimodal models (LMMs), yet current data enhancement methods often suffer from factual errors and hallucinations due to inadequate visual perception. To address this challenge, we propose VERITAS, a pipeline that systematically integrates vision priors and multiple state-of-the-art LMMs with statistical methods to enhance SFT data quality. VERITAS leverages visual recognition models (RAM++) and OCR systems (PP-OCRv4) to extract structured vision priors, which are combined with images, questions, and answers. Three LMMs (GPT-4o, Gemini-2.5-Pro, Doubao-1.5-pro) evaluate the original answers, providing critique rationales and scores that are statistically fused into a high-confidence consensus score serving as ground truth. Using this consensus, we train a lightweight critic model via Group Relative Policy Optimization (GRPO), enhancing reasoning capabilities efficiently. Each LMM then refines the original answers based on the critiques, generating new candidate answers; we select the highest-scoring one as the final refined answer. Experiments across six multimodal benchmarks demonstrate that models fine-tuned with data processed by VERITAS consistently outperform those using raw data, particularly in text-rich and fine-grained reasoning tasks. Our critic model exhibits enhanced capability comparable to state-of-the-art LMMs while being significantly more efficient. We release our pipeline, datasets, and model checkpoints to advance research in multimodal data optimization.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

监督微调 多模态模型 数据质量优化 视觉先验 机器学习
相关文章