热点
关于我们
xx
xx
"
视觉语言模型
" 相关文章
告别繁琐文档处理!PaddleOCR-VL-vLLM-OpenAI-API本地部署教程:精准解析文本/表格/公式
掘金 人工智能
2025-11-07T21:58:13.000000Z
北大团队让AI学会考古!全球首个古希腊陶罐3D视觉问答数据集发布,还配了专用模型
智源社区
2025-11-07T15:22:50.000000Z
大道至简,中科院等提出OneRef:统一视觉定位和指代分割
我爱计算机视觉
2025-11-07T09:29:07.000000Z
北大团队让AI学会考古!全球首个古希腊陶罐3D视觉问答数据集发布,还配了专用模型
量子位
2025-11-07T09:23:09.000000Z
达摩院联合浙大、港理工推出PixelRefer:多模态大模型迈向像素级视觉理解
PaperWeekly
2025-11-06T16:29:51.000000Z
OCR战场再起风云:LightOnOCR-1B凭什么比DeepSeekOCR快1.7倍?(附演示开源地址)
掘金 人工智能
2025-11-05T14:22:29.000000Z
GenDexHand: Generative Simulation for Dexterous Hands
cs.AI updates on arXiv.org
2025-11-05T05:30:59.000000Z
Privacy Preserving Ordinal-Meta Learning with VLMs for Fine-Grained Fruit Quality Prediction
cs.AI updates on arXiv.org
2025-11-05T05:30:32.000000Z
Maestro: Orchestrating Robotics Modules with Vision-Language Models for Zero-Shot Generalist Robots
cs.AI updates on arXiv.org
2025-11-05T05:28:38.000000Z
A Retrospect to Multi-prompt Learning across Vision and Language
cs.AI updates on arXiv.org
2025-11-05T05:20:52.000000Z
Latent Domain Prompt Learning for Vision-Language Models
cs.AI updates on arXiv.org
2025-11-05T05:17:43.000000Z
SpatialTraceGen: High-Fidelity Traces for Efficient VLM Spatial Reasoning Distillation
cs.AI updates on arXiv.org
2025-11-05T05:17:11.000000Z
Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World
cs.AI updates on arXiv.org
2025-11-05T05:16:51.000000Z
Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries
cs.AI updates on arXiv.org
2025-11-05T05:14:24.000000Z
超越谷歌、Meta,360的FG-CLIP2为何能成为“全球最强图文模型”?
AI大模型工场
2025-11-04T16:29:32.000000Z
视觉语言模型“扫地僧”:360低调开源FG-CLIP2登顶29项全球基准测试 | 甲子光年
甲子光年
2025-11-04T12:26:50.000000Z
PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting
cs.AI updates on arXiv.org
2025-11-03T05:19:58.000000Z
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning
cs.AI updates on arXiv.org
2025-11-03T05:19:53.000000Z
Generating Accurate and Detailed Captions for High-Resolution Images
cs.AI updates on arXiv.org
2025-11-03T05:19:12.000000Z
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
cs.AI updates on arXiv.org
2025-11-03T05:18:49.000000Z