TinyDrop：降低ViT计算成本的新框架

cs.AI updates on arXiv.org 09月04日

TinyDrop：降低ViT计算成本的新框架

本文提出TinyDrop，一种基于轻量级视觉模型的训练自由token丢弃框架，旨在降低大ViT的推理成本而不损害准确性。TinyDrop在标准图像分类基准上表现出减少80% FLOPs的效果，具有通用性和实用性。

arXiv:2509.03379v1 Announce Type: cross Abstract: Vision Transformers (ViTs) achieve strong performance in image classification but incur high computational costs from processing all image tokens. To reduce inference costs in large ViTs without compromising accuracy, we propose TinyDrop, a training-free token dropping framework guided by a lightweight vision model. The guidance model estimates the importance of tokens while performing inference, thereby selectively discarding low-importance tokens if large vit models need to perform attention calculations. The framework operates plug-and-play, requires no architectural modifications, and is compatible with diverse ViT architectures. Evaluations on standard image classification benchmarks demonstrate that our framework reduces FLOPs by up to 80% for ViTs with minimal accuracy degradation, highlighting its generalization capability and practical utility for efficient ViT-based classification.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Vision Transformers TinyDrop Token Dropping Computational Cost Image Classification

相关文章

Researchers at Stanford University Propose Locality Alignment: A New Post-Training Stage for Vision Transformers ViTs

A Potential Successor to RLHF for Efficient LLM Alignment and the Resurgence of CNNs

Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder

FOCUS: Fused Observation of Channels for Unveiling Spectra

Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction

AlphaViT: A flexible game-playing AI for multiple games and variable board sizes

I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation

3DViT-GAT: A Unified Atlas-Based 3D Vision Transformer and Graph Learning Framework for Major Depressive Disorder Detection Using Structural MRI Data

How to build AI scaling laws for efficient LLM training and budget maximization

Communication Efficient Split Learning of ViTs with Attention-based Double Compression