热点
"LMMs" 相关文章
FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding
cs.AI updates on arXiv.org 2025-11-05T05:20:37.000000Z
From Charts to Code: A Hierarchical Benchmark for Multimodal Models
cs.AI updates on arXiv.org 2025-10-22T04:18:00.000000Z
VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data
cs.AI updates on arXiv.org 2025-10-20T04:08:46.000000Z
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
cs.AI updates on arXiv.org 2025-10-10T04:19:53.000000Z
Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices
cs.AI updates on arXiv.org 2025-10-08T04:08:01.000000Z
Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices
cs.AI updates on arXiv.org 2025-10-08T04:08:01.000000Z
Multimodal Function Vectors for Spatial Relations
cs.AI updates on arXiv.org 2025-10-06T04:19:16.000000Z
榜一换人!OCRBench v2九月新榜:揭示多模态大模型文档智能真实水平
PaperWeekly 2025-10-01T11:22:38.000000Z
OCRBench v2 25年9月最新榜单发布!揭示多模态大模型文档智能真实水平
我爱计算机视觉 2025-10-01T09:39:52.000000Z
榜一换人!OCRBench v2九月新榜:揭示多模态大模型文档智能真实水平
PaperWeekly 2025-09-30T15:51:38.000000Z
OCRBench v2 25年9月最新榜单发布!揭示多模态大模型文档智能真实水平
我爱计算机视觉 2025-09-25T09:50:35.000000Z
OCRBench v2 25年9月最新榜单发布!揭示多模态大模型文档智能真实水平
我爱计算机视觉 2025-09-25T09:50:35.000000Z
Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics
cs.AI updates on arXiv.org 2025-09-17T04:59:38.000000Z
InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning
cs.AI updates on arXiv.org 2025-09-17T04:45:25.000000Z
Promptception: How Sensitive Are Large Multimodal Models to Prompts?
cs.AI updates on arXiv.org 2025-09-05T04:45:56.000000Z
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
cs.AI updates on arXiv.org 2025-07-08T06:58:08.000000Z
Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark!
BAIR 2024-11-26T06:02:14.000000Z
MINT-1T: An Open-Source Trillion Token Multimodal Interleaved Dataset and a Key Component for Training Large Multimodal Models LMMs
MarkTechPost@AI 2024-06-20T07:01:47.000000Z