CHIRP: A Fine-Grained Benchmark for Open-Ended Response Evaluation in Vision-Language Models

cs.AI updates on arXiv.org 08月06日

CHIRP: A Fine-Grained Benchmark for Open-Ended Response Evaluation in Vision-Language Models

本文分析了现有视觉语言模型（VLM）的评估方法，提出新模型Robin和长形式响应基准CHIRP，旨在提升VLM评估的全面性和可靠性。

arXiv:2501.09672v3 Announce Type: replace-cross Abstract: The proliferation of Vision-Language Models (VLMs) in the past several years calls for rigorous and comprehensive evaluation methods and benchmarks. This work analyzes existing VLM evaluation techniques, including automated metrics, AI-based assessments, and human evaluations across diverse tasks. We first introduce Robin - a novel suite of VLMs that we built by combining Large Language Models (LLMs) and Vision Encoders (VEs) at multiple scales, and use Robin to identify shortcomings of current evaluation approaches across scales. Next, to overcome the identified limitations, we introduce CHIRP - a new long form response benchmark we developed for more robust and complete VLM evaluation. We provide open access to the Robin training code, model suite, and CHIRP benchmark to promote reproducibility and advance VLM research.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

视觉语言模型评估方法 Robin CHIRP

相关文章

Top Important Computer Vision Papers for the Week from 29/04 to 05/05

THRONE: Advancing the Evaluation of Hallucinations in Vision-Language Models

Google AI Introduces PaliGemma: A New Family of Vision Language Models

Researchers from UC Berkeley, UIUC, and NYU Developed an Algorithmic Framework that Uses Reinforcement Learning (RL) to Optimize Vision-Language Models (VLMs)

Demystifying Vision-Language Models: An In-Depth Exploration

Unlocking the Potential of Multimodal Data: A Look at Vision-Language Models and their Applications

Llama3-V: A SOTA Open-Source VLM Model Comparable performance to GPT4-V, Gemini Ultra, Claude Opus with a 100x Smaller Model

蜻蜓多分辨率缩放的大型视觉语言模型

LLM Spotlight: Falcon

Cephalo: A Series of Open-Source Multimodal Vision Large Language Models (V-LLMs) Specifically in the Context of Bio-Inspired Design