FineVision：大规模视觉语言模型数据集

cs.AI updates on arXiv.org 10月21日 12:28

FineVision：大规模视觉语言模型数据集

本文介绍FineVision，一个由2400万个样本构成的精心收集、整理和统一的视觉语言模型数据集，通过半自动化流程进行数据清洗和格式统一，并在人工审核下确保数据质量。研究表明，基于FineVision训练的模型在多项评估中表现优于现有数据集。

arXiv:2510.17269v1 Announce Type: cross Abstract: The advancement of vision-language models (VLMs) is hampered by a fragmented landscape of inconsistent and contaminated public datasets. We introduce FineVision, a meticulously collected, curated, and unified corpus of 24 million samples - the largest open resource of its kind. We unify more than 200 sources into 185 subsets via a semi-automated, human-in-the-loop pipeline: automation performs bulk ingestion and schema mapping, while reviewers audit mappings and spot-check outputs to verify faithful consumption of annotations, appropriate formatting and diversity, and safety; issues trigger targeted fixes and re-runs. The workflow further applies rigorous de-duplication within and across sources and decontamination against 66 public benchmarks. FineVision also encompasses agentic/GUI tasks with a unified action space; reviewers validate schemas and inspect a sample of trajectories to confirm executable fidelity. Models trained on FineVision consistently outperform those trained on existing open mixtures across a broad evaluation suite, underscoring the benefits of scale, data hygiene, and balanced automation with human oversight. We release the corpus and curation tools to accelerate data-centric VLM research.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

视觉语言模型数据集 FineVision 数据清洗模型评估

相关文章

Top Important Computer Vision Papers for the Week from 29/04 to 05/05

MS MARCO Web Search: A Large-Scale Information-Rich Web Dataset Featuring Millions of Real Clicked Query-Document Labels

This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars

THRONE: Advancing the Evaluation of Hallucinations in Vision-Language Models

英國釋出AI模型安全評估平臺Inspect

Google AI Introduces PaliGemma: A New Family of Vision Language Models

CinePile: A Novel Dataset and Benchmark Specifically Designed for Authentic Long-Form Video Understanding

Researchers from UC Berkeley, UIUC, and NYU Developed an Algorithmic Framework that Uses Reinforcement Learning (RL) to Optimize Vision-Language Models (VLMs)

Demystifying Vision-Language Models: An In-Depth Exploration

Unlocking the Potential of Multimodal Data: A Look at Vision-Language Models and their Applications