视觉Transformer特征表示与处理机制

cs.AI updates on arXiv.org 09月23日 14:03

视觉Transformer特征表示与处理机制

本文通过系统分析视觉Transformer（ViT）的6.6K个特征，揭示其从低级模式到高级语义的特征演化过程，并引入残差替换模型，以解释性特征简化原始计算，实现ViT机制的直观理解，同时展示其在去偏相关性中的应用。

arXiv:2509.17401v1 Announce Type: cross Abstract: How do vision transformers (ViTs) represent and process the world? This paper addresses this long-standing question through the first systematic analysis of 6.6K features across all layers, extracted via sparse autoencoders, and by introducing the residual replacement model, which replaces ViT computations with interpretable features in the residual stream. Our analysis reveals not only a feature evolution from low-level patterns to high-level semantics, but also how ViTs encode curves and spatial positions through specialized feature types. The residual replacement model scalably produces a faithful yet parsimonious circuit for human-scale interpretability by significantly simplifying the original computations. As a result, this framework enables intuitive understanding of ViT mechanisms. Finally, we demonstrate the utility of our framework in debiasing spurious correlations.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

视觉Transformer 特征表示处理机制残差替换模型去偏相关性

相关文章

Microsoft Research Introduces Gigapath: A Novel Vision Transformer For Digital Pathology

Inductive Biases in Deep Learning: Understanding Feature Representation

TiTok: An Innovative AI Method for Tokenizing Images into 1D Latent Sequences

Comprehensive Analysis of The Performance of Vision State Space Models (VSSMs), Vision Transformers, and Convolutional Neural Networks (CNNs)

LayerShuffle: Robust Vision Transformers for Arbitrary Layer Execution Orders

Meta Presents Sapiens: Foundation for Human Vision Models

当优秀成为一种“防御”：我优秀，但不快乐

专题好文 | Luc Van Gool团队: 基于分层注意力的视觉Transformer

Deep Learning Approach for Lithium-Ion Battery Life Prediction via Dual-Stream Vision Transformer

Researchers at Stanford University Propose ExPLoRA: A Highly Effective AI Technique to Improve Transfer Learning of Pre-Trained Vision Transformers (ViTs) Under Domain Shifts