热点
关于我们
xx
xx
"
视觉定位
" 相关文章
大道至简,中科院等提出OneRef:统一视觉定位和指代分割
我爱计算机视觉
2025-11-07T15:26:12.000000Z
不重构、不牺牲通用性:VLM-FO1,为任何VLM无损增强细粒度感知能力
PaperWeekly
2025-10-23T13:23:59.000000Z
不再靠「猜坐标」!颜水成团队等联合发布PaDT多模态大模型:实现真正的多模态表征输出
机器之心
2025-10-16T06:43:35.000000Z
RadVLM: A Multitask Conversational Vision-Language Model for Radiology
cs.AI updates on arXiv.org
2025-10-13T04:15:03.000000Z
\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding
cs.AI updates on arXiv.org
2025-10-07T04:16:16.000000Z
\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding
cs.AI updates on arXiv.org
2025-10-07T04:16:16.000000Z
\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding
cs.AI updates on arXiv.org
2025-10-07T04:16:16.000000Z
\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding
cs.AI updates on arXiv.org
2025-10-07T04:16:16.000000Z
Point-It-Out: Benchmarking Embodied Reasoning for Vision Language Models in Multi-Stage Visual Grounding
cs.AI updates on arXiv.org
2025-10-01T06:00:59.000000Z
Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models
machinelearning apple
2025-09-28T15:41:02.000000Z
📆 ThursdAI - Sep 18 - Gpt-5-Codex, OAI wins ICPC, Reve, ARC-AGI SOTA Interview, Meta AI Glasses & more AI news
ThursdAI - Recaps of the most high signal AI weekly spaces
2025-09-25T10:01:32.000000Z
Controlling Multimodal LLMs via Reward-guided Decoding
cs.AI updates on arXiv.org
2025-08-18T04:21:34.000000Z
Vision-Based Localization and LLM-based Navigation for Indoor Environments
cs.AI updates on arXiv.org
2025-08-12T04:02:08.000000Z
Latent Expression Generation for Referring Image Segmentation and Grounding
cs.AI updates on arXiv.org
2025-08-08T04:17:45.000000Z
LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering
cs.AI updates on arXiv.org
2025-07-22T04:44:42.000000Z
Visual Grounding Methods for Efficient Interaction with Desktop Graphical User Interfaces
cs.AI updates on arXiv.org
2025-07-21T04:06:33.000000Z
类R1强化学习迁移到视觉定位!全开源Vision-R1将图文大模型性能提升50%
机器之心
2025-04-08T07:53:36.000000Z
视觉定位新范式!清华团队推出Migician,支持任意形式多图定位
智源社区
2025-02-22T16:07:24.000000Z
27页综述,354篇参考文献!最详尽的视觉定位综述来了
机器之心
2025-01-31T06:49:50.000000Z
视觉定位任务新入门必读!跟进最新进展,视觉定位审稿人必读论文!
我爱计算机视觉
2025-01-20T13:56:00.000000Z