视觉推理_Fishai

热点

"视觉推理" 相关文章

2025.11.05 | 向量草图测代码；先画后想补视觉

HuggingFace 每日AI论文速递 2025-11-06T06:56:26.000000Z

A Multi-Modal Neuro-Symbolic Approach for Spatial Reasoning-Based Visual Grounding in Robotics

cs.AI updates on arXiv.org 2025-11-03T05:19:02.000000Z

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

cs.AI updates on arXiv.org 2025-10-31T04:09:59.000000Z

Evaluating ChatGPT's Performance in Classifying Pneumonia from Chest X-Ray Images

cs.AI updates on arXiv.org 2025-10-28T04:11:10.000000Z

RewardMap: 通过多阶段强化学习解决细粒度视觉推理的Sparse Reward

机器之心 2025-10-21T08:56:11.000000Z

ZeST: an LLM-based Zero-Shot Traversability Navigation for Unknown Environments

cs.AI updates on arXiv.org 2025-10-21T04:30:24.000000Z

SSL4RL: Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning

cs.AI updates on arXiv.org 2025-10-21T04:24:26.000000Z

RECODE: Reasoning Through Code Generation for Visual Question Answering

cs.AI updates on arXiv.org 2025-10-16T04:29:39.000000Z

CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs

cs.AI updates on arXiv.org 2025-10-15T04:58:19.000000Z

EncQA: Benchmarking Vision-Language Models on Visual Encodings for Charts

machinelearning apple 2025-10-13T22:38:40.000000Z

1万亿参数Ling-1T开源，国产LLM牛了~

PaperAgent 2025-10-10T10:09:13.000000Z

To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models

cs.AI updates on arXiv.org 2025-10-10T04:19:09.000000Z

VCoT-Grasp: Grasp Foundation Models with Visual Chain-of-Thought Reasoning for Language-driven Grasp Generation

cs.AI updates on arXiv.org 2025-10-08T04:14:35.000000Z

ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering

cs.AI updates on arXiv.org 2025-10-07T04:09:11.000000Z

ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering

cs.AI updates on arXiv.org 2025-10-07T04:09:11.000000Z

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

cs.AI updates on arXiv.org 2025-10-03T04:18:42.000000Z

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

cs.AI updates on arXiv.org 2025-10-03T04:18:42.000000Z

Can World Models Benefit VLMs for World Dynamics?

cs.AI updates on arXiv.org 2025-10-02T04:18:30.000000Z

NeurIPS 2025 | UniPixel：首个统一对象指代与分割的像素级推理框架，让大模型看懂每一个像素

我爱计算机视觉 2025-10-01T09:39:51.000000Z

NeurIPS 2025 Spotlight | FSDrive统一VLA和世界模型，推动自动驾驶迈向视觉推理

机器之心 2025-09-30T14:49:40.000000Z

Copyright © 2019 FISHAI.All Rights Reserved