热点
关于我们
xx
xx
"
多模态大型语言模型
" 相关文章
Sketch2BIM: A Multi-Agent Human-AI Collaborative Pipeline to Convert Hand-Drawn Floor Plans to 3D BIM
cs.AI updates on arXiv.org
2025-10-27T06:16:22.000000Z
Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
cs.AI updates on arXiv.org
2025-10-15T05:10:08.000000Z
Training-Free In-Context Forensic Chain for Image Manipulation Detection and Localization
cs.AI updates on arXiv.org
2025-10-14T04:17:48.000000Z
Diagnosing Shoulder Disorders Using Multimodal Large Language Models and Consumer-Grade Cameras
cs.AI updates on arXiv.org
2025-10-13T04:14:11.000000Z
MLLM4TS: Leveraging Vision and Multimodal Language Models for General Time-Series Analysis
cs.AI updates on arXiv.org
2025-10-10T04:10:39.000000Z
Seeing the Big Picture: Evaluating Multimodal LLMs' Ability to Interpret and Grade Handwritten Student Work
cs.AI updates on arXiv.org
2025-10-08T04:12:17.000000Z
V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs
cs.AI updates on arXiv.org
2025-10-01T06:00:57.000000Z
Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction-Reasoning Synergy
cs.AI updates on arXiv.org
2025-09-30T04:06:45.000000Z
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
cs.AI updates on arXiv.org
2025-09-23T05:52:44.000000Z
RadarQA: Multi-modal Quality Analysis of Weather Radar Forecasts
cs.AI updates on arXiv.org
2025-08-19T04:01:30.000000Z
Silicon Minds versus Human Hearts: The Wisdom of Crowds Beats the Wisdom of AI in Emotion Recognition
cs.AI updates on arXiv.org
2025-08-13T04:14:56.000000Z
SafePLUG: Empowering Multimodal LLMs with Pixel-Level Insight and Temporal Grounding for Traffic Accident Understanding
cs.AI updates on arXiv.org
2025-08-12T04:02:28.000000Z
MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization
cs.AI updates on arXiv.org
2025-08-12T04:02:25.000000Z
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
cs.AI updates on arXiv.org
2025-07-30T04:46:14.000000Z
T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation
cs.AI updates on arXiv.org
2025-07-29T04:22:26.000000Z
DOGR: Towards Versatile Visual Document Grounding and Referring
cs.AI updates on arXiv.org
2025-07-22T04:34:00.000000Z
A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends
cs.AI updates on arXiv.org
2025-07-15T04:26:56.000000Z
Advancing MLLM Alignment Through MM-RLHF: A Large-Scale Human Preference Dataset for Multimodal Tasks
MarkTechPost@AI
2025-02-19T18:33:56.000000Z
Meta AI Releases LongVU: A Multimodal Large Language Model that can Address the Significant Challenge of Long Video Understanding
MarkTechPost@AI
2024-10-30T22:50:09.000000Z
SafeBench:多模态大模型安全评估框架,揭示MLLM安全隐患
MIT 科技评论 - 本周热榜
2024-10-28T02:45:33.000000Z