热点
"跨模态理解" 相关文章
HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA
cs.AI updates on arXiv.org 2025-11-05T05:30:35.000000Z
超越谷歌、Meta,360的FG-CLIP2为何能成为“全球最强图文模型”?
AI大模型工场 2025-11-04T16:29:32.000000Z
Visual Features Across Modalities: SVG and ASCII Art Reveal Cross-Modal Understanding
https://simonwillison.net/atom/everything 2025-10-25T03:31:07.000000Z
Visual Features Across Modalities: SVG and ASCII Art Reveal Cross-Modal Understanding
https://simonwillison.net/atom/everything 2025-10-25T03:31:07.000000Z
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
cs.AI updates on arXiv.org 2025-10-20T04:14:47.000000Z
Being-VL的视觉BPE路线:把「看」和「说」真正统一起来
机器之心 2025-10-09T04:21:35.000000Z
Being-VL的视觉BPE路线:把「看」和「说」真正统一起来
机器之心 2025-10-09T04:21:35.000000Z
从离散token到多模态统一:Discrete Tokenization全景综述重磅上线
PaperWeekly 2025-08-11T08:59:59.000000Z
集成500+多模态现实任务!全新MEGA-Bench评测套件:CoT对开源模型反而有害?
新智元 2024-11-16T14:16:08.000000Z
集成500+多模态现实任务,全新MEGA-Bench评测套件:CoT对开源模型反而有害?
36kr-科技 2024-11-15T07:36:44.000000Z