LVBench: An Extreme Long Video Understanding Benchmark

cs.AI updates on arXiv.org 08月12日

LVBench: An Extreme Long Video Understanding Benchmark

文章介绍了一种名为LVBench的长视频理解基准，旨在提升多模态大语言模型在长视频理解上的表现，通过公开数据集和多样化任务挑战模型能力，推动相关技术的发展。

arXiv:2406.08035v3 Announce Type: replace-cross Abstract: Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the demands of real-world applications such as embodied intelligence for long-term decision-making, in-depth movie reviews and discussions, and live sports commentary, all of which require comprehension of long videos spanning several hours. To address this gap, we introduce LVBench, a benchmark specifically designed for long video understanding. Our dataset comprises publicly sourced videos and encompasses a diverse set of tasks aimed at long video comprehension and information extraction. LVBench is designed to challenge multimodal models to demonstrate long-term memory and extended comprehension capabilities. Our extensive evaluations reveal that current multimodal models still underperform on these demanding long video understanding tasks. Through LVBench, we aim to spur the development of more advanced models capable of tackling the complexities of long video comprehension. Our data and code are publicly available at: https://lvbench.github.io.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LVBench 长视频理解多模态模型信息提取数据集

相关文章

AI Revolution Journey With Qwen, RAG, and LangChain

DeepSeek 的这个价格和效果，真的让我完全丧失了折腾本地运行小模型的想法了。 1M token 2 块钱，冲 50 块，可以一直开着我的插件浏览网页，每次切换网页都能自...

MS MARCO Web Search: A Large-Scale Information-Rich Web Dataset Featuring Millions of Real Clicked Query-Document Labels

AI Trends 2024: Computer Vision with Naila Murray - #665

Unifying Vision and Language Models with Mohit Bansal - #636

Runway Gen-2: Generative AI for Video Creation with Anastasis Germanidis - #622

Tsinghua University Researchers Propose ADELIE: Enhancing Information Extraction with Aligned Large Language Models Around Human-Centric Tasks

This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars

刚开始以为是本史书，结果三段小黄文直接把我看懵了，没想到啊没想到

Comment on Top 5 Best iOS App Development Companies in Dubai by 2 1 3 in fraction form