基于概念解释的对抗性补丁防御策略

cs.AI updates on arXiv.org 10月07日

基于概念解释的对抗性补丁防御策略

本文提出一种对抗性补丁防御方法，通过利用概念解释识别并抑制最具影响力的概念激活向量，以中和补丁效果。实验结果表明，该方法在保持高鲁棒性和清洁准确度的同时，对不同的补丁大小和位置均有良好性能。

arXiv:2510.04245v1 Announce Type: cross Abstract: Adversarial patch attacks pose a practical threat to deep learning models by forcing targeted misclassifications through localized perturbations, often realized in the physical world. Existing defenses typically assume prior knowledge of patch size or location, limiting their applicability. In this work, we propose a patch-agnostic defense that leverages concept-based explanations to identify and suppress the most influential concept activation vectors, thereby neutralizing patch effects without explicit detection. Evaluated on Imagenette with a ResNet-50, our method achieves higher robust and clean accuracy than the state-of-the-art PatchCleanser, while maintaining strong performance across varying patch sizes and locations. Our results highlight the promise of combining interpretability with robustness and suggest concept-driven defenses as a scalable strategy for securing machine learning models against adversarial patch attacks.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

对抗性攻击深度学习概念解释鲁棒性机器学习

相关文章

How bad a future do ML researchers expect?

Accelerating ML application development: Production-ready Airflow integrations with critical AI tools

Import AI 363: ByteDance’s 10k GPU training run; PPO vs REINFORCE; and generative everything

Weka Makes Life Simpler for Developers, Engineers, and Architects

PostgresML: Streamlining AI Model Deployment With PostgreSQL Integration

Harmonizing AI: Crafting Personalized Song Suggestions

xLSTM: Enhancing Long Short-Term Memory LSTM Capabilities for Advanced Language Modeling and Beyond

Analyzing the Impact of Flash Attention on Numeric Deviation and Training Stability in Large-Scale Machine Learning Models

Learn AI Together — Towards AI Community Newsletter #23

Top Important LLM Papers for the Week from 29/04 to 05/05