cs.AI updates on arXiv.org 10月01日
视频摘要解释生成研究
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文通过扩展现有视频摘要多粒度解释框架,结合LLaVA-OneVision模型,生成自然语言描述,并提出一种基于语义重叠的视觉解释可解释性评估方法,以验证视频摘要解释的合理性。

arXiv:2509.26225v1 Announce Type: cross Abstract: In this paper, we present our experimental study on generating plausible textual explanations for the outcomes of video summarization. For the needs of this study, we extend an existing framework for multigranular explanation of video summarization by integrating a SOTA Large Multimodal Model (LLaVA-OneVision) and prompting it to produce natural language descriptions of the obtained visual explanations. Following, we focus on one of the most desired characteristics for explainable AI, the plausibility of the obtained explanations that relates with their alignment with the humans' reasoning and expectations. Using the extended framework, we propose an approach for evaluating the plausibility of visual explanations by quantifying the semantic overlap between their textual descriptions and the textual descriptions of the corresponding video summaries, with the help of two methods for creating sentence embeddings (SBERT, SimCSE). Based on the extended framework and the proposed plausibility evaluation approach, we conduct an experimental study using a SOTA method (CA-SUM) and two datasets (SumMe, TVSum) for video summarization, to examine whether the more faithful explanations are also the more plausible ones, and identify the most appropriate approach for generating plausible textual explanations for video summarization.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

视频摘要 可解释性AI LLaVA-OneVision
相关文章