热点
关于我们
xx
xx
"
大型多模态模型
" 相关文章
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
cs.AI updates on arXiv.org
2025-10-20T04:10:18.000000Z
LM Fight Arena: Benchmarking Large Multimodal Models via Game Competition
cs.AI updates on arXiv.org
2025-10-13T04:09:09.000000Z
How to Teach Large Multimodal Models New Skills
cs.AI updates on arXiv.org
2025-10-10T04:09:26.000000Z
Multimodal Function Vectors for Spatial Relations
cs.AI updates on arXiv.org
2025-10-06T04:19:16.000000Z
NeurIPS 2025 | UniPixel:首个统一对象指代与分割的像素级推理框架,让大模型看懂每一个像素
我爱计算机视觉
2025-10-01T09:39:51.000000Z
NeurIPS 2025 | UniPixel:首个统一对象指代与分割的像素级推理框架,让大模型看懂每一个像素
我爱计算机视觉
2025-09-29T09:10:34.000000Z
Unveiling Effective In-Context Configurations for Image Captioning: An External & Internal Analysis
cs.AI updates on arXiv.org
2025-07-14T04:08:24.000000Z
LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation
cs.AI updates on arXiv.org
2025-07-11T04:04:05.000000Z
PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning
cs.AI updates on arXiv.org
2025-07-03T04:07:25.000000Z
MMSearch-R1: End-to-End Reinforcement Learning for Active Image Search in LMMs
MarkTechPost@AI
2025-04-07T04:08:41.000000Z
Salesforce AI Research Introduce xGen-MM (BLIP-3): A Scalable AI Framework for Advancing Large Multimodal Models with Enhanced Training and Performance Capabilities
MarkTechPost@AI
2024-08-19T22:04:54.000000Z
MINT-1T Dataset Released: A Multimodal Dataset with One Trillion Tokens to Build Large Multimodal Models
MarkTechPost@AI
2024-07-26T12:04:20.000000Z
Visual Haystacks Benchmark: The First “Visual-Centric” Needle-In-A-Haystack (NIAH) Benchmark to Assess LMMs’ Capability in Long-Context Visual Retrieval and Reasoning
MarkTechPost@AI
2024-07-24T07:19:20.000000Z
LLaVA-NeXT-Interleave: A Versatile Large Multimodal Model LMM that can Handle Settings like Multi-image, Multi-frame, and Multi-view
MarkTechPost@AI
2024-07-13T16:46:13.000000Z
LongVA and the Impact of Long Context Transfer in Visual Processing: Enhancing Large Multimodal Models for Long Video Sequences
MarkTechPost@AI
2024-06-29T07:01:45.000000Z