视觉语言模型_Fishai

热点

"视觉语言模型" 相关文章

告别繁琐文档处理！PaddleOCR-VL-vLLM-OpenAI-API本地部署教程：精准解析文本/表格/公式

掘金人工智能 2025-11-07T21:58:13.000000Z

北大团队让AI学会考古！全球首个古希腊陶罐3D视觉问答数据集发布，还配了专用模型

智源社区 2025-11-07T15:22:50.000000Z

大道至简，中科院等提出OneRef：统一视觉定位和指代分割

我爱计算机视觉 2025-11-07T09:29:07.000000Z

北大团队让AI学会考古！全球首个古希腊陶罐3D视觉问答数据集发布，还配了专用模型

量子位 2025-11-07T09:23:09.000000Z

达摩院联合浙大、港理工推出PixelRefer：多模态大模型迈向像素级视觉理解

PaperWeekly 2025-11-06T16:29:51.000000Z

OCR战场再起风云：LightOnOCR-1B凭什么比DeepSeekOCR快1.7倍？（附演示开源地址）

掘金人工智能 2025-11-05T14:22:29.000000Z

GenDexHand: Generative Simulation for Dexterous Hands

cs.AI updates on arXiv.org 2025-11-05T05:30:59.000000Z

Privacy Preserving Ordinal-Meta Learning with VLMs for Fine-Grained Fruit Quality Prediction

cs.AI updates on arXiv.org 2025-11-05T05:30:32.000000Z

Maestro: Orchestrating Robotics Modules with Vision-Language Models for Zero-Shot Generalist Robots

cs.AI updates on arXiv.org 2025-11-05T05:28:38.000000Z

A Retrospect to Multi-prompt Learning across Vision and Language

cs.AI updates on arXiv.org 2025-11-05T05:20:52.000000Z

Latent Domain Prompt Learning for Vision-Language Models

cs.AI updates on arXiv.org 2025-11-05T05:17:43.000000Z

SpatialTraceGen: High-Fidelity Traces for Efficient VLM Spatial Reasoning Distillation

cs.AI updates on arXiv.org 2025-11-05T05:17:11.000000Z

Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World

cs.AI updates on arXiv.org 2025-11-05T05:16:51.000000Z

Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries

cs.AI updates on arXiv.org 2025-11-05T05:14:24.000000Z

超越谷歌、Meta，360的FG-CLIP2为何能成为“全球最强图文模型”？

AI大模型工场 2025-11-04T16:29:32.000000Z

视觉语言模型“扫地僧”：360低调开源FG-CLIP2登顶29项全球基准测试 | 甲子光年

甲子光年 2025-11-04T12:26:50.000000Z

PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting

cs.AI updates on arXiv.org 2025-11-03T05:19:58.000000Z

Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning

cs.AI updates on arXiv.org 2025-11-03T05:19:53.000000Z

Generating Accurate and Detailed Captions for High-Resolution Images

cs.AI updates on arXiv.org 2025-11-03T05:19:12.000000Z

Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

cs.AI updates on arXiv.org 2025-11-03T05:18:49.000000Z

Copyright © 2019 FISHAI.All Rights Reserved