LMMs_Fishai

热点

"LMMs" 相关文章

FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding

cs.AI updates on arXiv.org 2025-11-05T05:20:37.000000Z

From Charts to Code: A Hierarchical Benchmark for Multimodal Models

cs.AI updates on arXiv.org 2025-10-22T04:18:00.000000Z

VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data

cs.AI updates on arXiv.org 2025-10-20T04:08:46.000000Z

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

cs.AI updates on arXiv.org 2025-10-10T04:19:53.000000Z

Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices

cs.AI updates on arXiv.org 2025-10-08T04:08:01.000000Z

Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices

cs.AI updates on arXiv.org 2025-10-08T04:08:01.000000Z

Multimodal Function Vectors for Spatial Relations

cs.AI updates on arXiv.org 2025-10-06T04:19:16.000000Z

榜一换人！OCRBench v2九月新榜：揭示多模态大模型文档智能真实水平

PaperWeekly 2025-10-01T11:22:38.000000Z

OCRBench v2 25年9月最新榜单发布！揭示多模态大模型文档智能真实水平

我爱计算机视觉 2025-10-01T09:39:52.000000Z

榜一换人！OCRBench v2九月新榜：揭示多模态大模型文档智能真实水平

PaperWeekly 2025-09-30T15:51:38.000000Z

OCRBench v2 25年9月最新榜单发布！揭示多模态大模型文档智能真实水平

我爱计算机视觉 2025-09-25T09:50:35.000000Z

OCRBench v2 25年9月最新榜单发布！揭示多模态大模型文档智能真实水平

我爱计算机视觉 2025-09-25T09:50:35.000000Z

Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics

cs.AI updates on arXiv.org 2025-09-17T04:59:38.000000Z

InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning

cs.AI updates on arXiv.org 2025-09-17T04:45:25.000000Z

Promptception: How Sensitive Are Large Multimodal Models to Prompts?

cs.AI updates on arXiv.org 2025-09-05T04:45:56.000000Z

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

cs.AI updates on arXiv.org 2025-07-08T06:58:08.000000Z

Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark!

BAIR 2024-11-26T06:02:14.000000Z

MINT-1T: An Open-Source Trillion Token Multimodal Interleaved Dataset and a Key Component for Training Large Multimodal Models LMMs

MarkTechPost@AI 2024-06-20T07:01:47.000000Z

Copyright © 2019 FISHAI.All Rights Reserved