受限计算下AI安全攻击优化策略

cs.AI updates on arXiv.org 11月03日 13:18

受限计算下AI安全攻击优化策略

本文针对AI安全研究中受限计算条件下的攻击强度最大化问题，提出了一种细粒度控制机制，通过选择性重新计算层激活，在保证攻击效果的同时降低计算成本，实验结果表明该方法在同等成本下优于现有基准，且在对抗训练中仅需30%的计算预算即可达到与原始预算相当的性能。

arXiv:2510.26981v1 Announce Type: cross Abstract: This work tackles a critical challenge in AI safety research under limited compute: given a fixed computation budget, how can one maximize the strength of iterative adversarial attacks? Coarsely reducing the number of attack iterations lowers cost but substantially weakens effectiveness. To fulfill the attainable attack efficacy within a constrained budget, we propose a fine-grained control mechanism that selectively recomputes layer activations across both iteration-wise and layer-wise levels. Extensive experiments show that our method consistently outperforms existing baselines at equal cost. Moreover, when integrated into adversarial training, it attains comparable performance with only 30% of the original budget.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签