大型多模态模型_Fishai

热点

"大型多模态模型" 相关文章

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models

cs.AI updates on arXiv.org 2025-10-20T04:10:18.000000Z

LM Fight Arena: Benchmarking Large Multimodal Models via Game Competition

cs.AI updates on arXiv.org 2025-10-13T04:09:09.000000Z

How to Teach Large Multimodal Models New Skills

cs.AI updates on arXiv.org 2025-10-10T04:09:26.000000Z

Multimodal Function Vectors for Spatial Relations

cs.AI updates on arXiv.org 2025-10-06T04:19:16.000000Z

NeurIPS 2025 | UniPixel：首个统一对象指代与分割的像素级推理框架，让大模型看懂每一个像素

我爱计算机视觉 2025-10-01T09:39:51.000000Z

NeurIPS 2025 | UniPixel：首个统一对象指代与分割的像素级推理框架，让大模型看懂每一个像素

我爱计算机视觉 2025-09-29T09:10:34.000000Z

Unveiling Effective In-Context Configurations for Image Captioning: An External & Internal Analysis

cs.AI updates on arXiv.org 2025-07-14T04:08:24.000000Z

LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation

cs.AI updates on arXiv.org 2025-07-11T04:04:05.000000Z

PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning

cs.AI updates on arXiv.org 2025-07-03T04:07:25.000000Z

MMSearch-R1: End-to-End Reinforcement Learning for Active Image Search in LMMs

MarkTechPost@AI 2025-04-07T04:08:41.000000Z

Salesforce AI Research Introduce xGen-MM (BLIP-3): A Scalable AI Framework for Advancing Large Multimodal Models with Enhanced Training and Performance Capabilities

MarkTechPost@AI 2024-08-19T22:04:54.000000Z

MINT-1T Dataset Released: A Multimodal Dataset with One Trillion Tokens to Build Large Multimodal Models

MarkTechPost@AI 2024-07-26T12:04:20.000000Z

Visual Haystacks Benchmark: The First “Visual-Centric” Needle-In-A-Haystack (NIAH) Benchmark to Assess LMMs’ Capability in Long-Context Visual Retrieval and Reasoning

MarkTechPost@AI 2024-07-24T07:19:20.000000Z

LLaVA-NeXT-Interleave: A Versatile Large Multimodal Model LMM that can Handle Settings like Multi-image, Multi-frame, and Multi-view

MarkTechPost@AI 2024-07-13T16:46:13.000000Z

LongVA and the Impact of Long Context Transfer in Visual Processing: Enhancing Large Multimodal Models for Long Video Sequences

MarkTechPost@AI 2024-06-29T07:01:45.000000Z

Copyright © 2019 FISHAI.All Rights Reserved