cs.AI updates on arXiv.org 07月29日
Interpretable Open-Vocabulary Referring Object Detection with Reverse Contrast Attention
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出Reverse Contrast Attention(RCA)方法,通过抑制极端权重并增强中等权重,提升视觉语言模型对象定位精度,并在多个开源模型上验证其有效性。

arXiv:2507.19891v1 Announce Type: cross Abstract: We propose Reverse Contrast Attention (RCA), a plug-in method that enhances object localization in vision-language transformers without retraining. RCA reweights final-layer attention by suppressing extremes and amplifying mid-level activations to let semantically relevant but subdued tokens guide predictions. We evaluate it on Open Vocabulary Referring Object Detection (OV-RefOD), introducing FitAP, a confidence-free average precision metric based on IoU and box area. RCA improves FitAP in 11 out of 15 open-source VLMs, with gains up to $+26.6\%$. Effectiveness aligns with attention sharpness and fusion timing; while late-fusion models benefit consistently, models like $\texttt{DeepSeek-VL2}$ also improve, pointing to capacity and disentanglement as key factors. RCA offers both interpretability and performance gains for multimodal transformers.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

RCA 视觉语言模型 对象定位 性能提升
相关文章