cs.AI updates on arXiv.org 09月30日 12:06
TQAs向MMQAs转换:构建高质量多模态基准
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出将文本问答对(TQAs)转化为高质量多模态问答对(MMQAs)的方法,构建了两个广泛的多模态问答基准,并开发了一种名为Q-Mirror的代理系统以实现MMQAs生成与评价的闭环迭代,有效提升了多模态问答质量评估。

arXiv:2509.24297v1 Announce Type: cross Abstract: High-quality, multi-modal benchmarks are crucial for advancing scientific reasoning in large models yet their manual creation is costly and unscalable. To address this bottleneck, we explore the potential for transforming Text-Only QA Pairs (TQAs) into high-quality Multi-Modal QA Pairs (MMQAs), which include three parts: 1) Task Definition \& Evaluation Rubric: We develop a TQA-to-MMQA framework and establish a comprehensive, multi-dimensional MMQA quality rubric that provides principles for the transformation. 2) Benchmark Construction: Then we construct two extensive benchmarks to rigorously evaluate state-of-the-art generation \& understanding models on the distinct tasks of MMQA generation \& MMQA quality evaluation. 3) Preliminary Solution: We develop an agentic system (Q-Mirror), which operationalizes our framework by integrating MMQA generation and evaluation into a closed loop for iterative refinement. Our experiments show that while state-of-the-art models can generate MMQAs, their outputs still leave substantial gaps, underscoring the need for reliable evaluation. We further demonstrate that top-tier understanding models align closely with human judgment in MMQA quality assessment. Leveraging both insights, the Q-Mirror agent raises average scores from 78.90 to 85.22 and pass rates from 72\% to 95\%, offering a practical path to large-scale scientific benchmarks.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

多模态问答 基准构建 代理系统 质量评估 科学推理
相关文章