多模态大型语言模型_Fishai

热点

"多模态大型语言模型" 相关文章

Sketch2BIM: A Multi-Agent Human-AI Collaborative Pipeline to Convert Hand-Drawn Floor Plans to 3D BIM

cs.AI updates on arXiv.org 2025-10-27T06:16:22.000000Z

Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning

cs.AI updates on arXiv.org 2025-10-15T05:10:08.000000Z

Training-Free In-Context Forensic Chain for Image Manipulation Detection and Localization

cs.AI updates on arXiv.org 2025-10-14T04:17:48.000000Z

Diagnosing Shoulder Disorders Using Multimodal Large Language Models and Consumer-Grade Cameras

cs.AI updates on arXiv.org 2025-10-13T04:14:11.000000Z

MLLM4TS: Leveraging Vision and Multimodal Language Models for General Time-Series Analysis

cs.AI updates on arXiv.org 2025-10-10T04:10:39.000000Z

Seeing the Big Picture: Evaluating Multimodal LLMs' Ability to Interpret and Grade Handwritten Student Work

cs.AI updates on arXiv.org 2025-10-08T04:12:17.000000Z

V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs

cs.AI updates on arXiv.org 2025-10-01T06:00:57.000000Z

Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction-Reasoning Synergy

cs.AI updates on arXiv.org 2025-09-30T04:06:45.000000Z

The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA

cs.AI updates on arXiv.org 2025-09-23T05:52:44.000000Z

RadarQA: Multi-modal Quality Analysis of Weather Radar Forecasts

cs.AI updates on arXiv.org 2025-08-19T04:01:30.000000Z

Silicon Minds versus Human Hearts: The Wisdom of Crowds Beats the Wisdom of AI in Emotion Recognition

cs.AI updates on arXiv.org 2025-08-13T04:14:56.000000Z

SafePLUG: Empowering Multimodal LLMs with Pixel-Level Insight and Temporal Grounding for Traffic Accident Understanding

cs.AI updates on arXiv.org 2025-08-12T04:02:28.000000Z

MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization

cs.AI updates on arXiv.org 2025-08-12T04:02:25.000000Z

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation

cs.AI updates on arXiv.org 2025-07-30T04:46:14.000000Z

T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation

cs.AI updates on arXiv.org 2025-07-29T04:22:26.000000Z

DOGR: Towards Versatile Visual Document Grounding and Referring

cs.AI updates on arXiv.org 2025-07-22T04:34:00.000000Z

A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends

cs.AI updates on arXiv.org 2025-07-15T04:26:56.000000Z

Advancing MLLM Alignment Through MM-RLHF: A Large-Scale Human Preference Dataset for Multimodal Tasks

MarkTechPost@AI 2025-02-19T18:33:56.000000Z

Meta AI Releases LongVU: A Multimodal Large Language Model that can Address the Significant Challenge of Long Video Understanding

MarkTechPost@AI 2024-10-30T22:50:09.000000Z

SafeBench：多模态大模型安全评估框架，揭示MLLM安全隐患

MIT 科技评论 - 本周热榜 2024-10-28T02:45:33.000000Z

Copyright © 2019 FISHAI.All Rights Reserved