PixelHumor：评估LMMs幽默理解能力

cs.AI updates on arXiv.org 09月17日

本文介绍了PixelHumor，一个包含2800个标注的多格漫画数据集，旨在评估大型多模态模型（LMMs）在理解和识别多模态幽默和叙事序列方面的能力。实验结果显示，当前顶尖模型在面板排序任务上的准确率仅为61%，远低于人类表现，揭示了当前模型在视觉和文本线索整合方面的局限性。

arXiv:2509.12248v1 Announce Type: cross Abstract: Understanding humor is a core aspect of social intelligence, yet it remains a significant challenge for Large Multimodal Models (LMMs). We introduce PixelHumor, a benchmark dataset of 2,800 annotated multi-panel comics designed to evaluate LMMs' ability to interpret multimodal humor and recognize narrative sequences. Experiments with state-of-the-art LMMs reveal substantial gaps: for instance, top models achieve only 61% accuracy in panel sequencing, far below human performance. This underscores critical limitations in current models' integration of visual and textual cues for coherent narrative and humor understanding. By providing a rigorous framework for evaluating multimodal contextual and narrative reasoning, PixelHumor aims to drive the development of LMMs that better engage in natural, socially aware interactions.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签