Adam优化器参数空间旋转敏感性研究

cs.AI updates on arXiv.org 前天 14:31

Adam优化器参数空间旋转敏感性研究

本文研究了Adam优化器对参数空间旋转的敏感性，揭示了其在不同旋转类型下的行为，并提出了旋转依赖性理论框架的潜在关键指标。

arXiv:2410.19964v2 Announce Type: replace-cross Abstract: Despite its widespread adoption, Adam's advantage over Stochastic Gradient Descent (SGD) lacks a comprehensive theoretical explanation. This paper investigates Adam's sensitivity to rotations of the parameter space. We observe that Adam's performance in training transformers degrades under random rotations of the parameter space, indicating a crucial sensitivity to the choice of basis in practice. This reveals that conventional rotation-invariant assumptions are insufficient to capture Adam's advantages theoretically. To better understand the rotation-dependent properties that benefit Adam, we also identify structured rotations that preserve or even enhance its empirical performance. We then examine the rotation-dependent assumptions in the literature and find that they fall short in explaining Adam's behaviour across various rotation types. In contrast, we verify the orthogonality of the update as a promising indicator of Adam's basis sensitivity, suggesting it may be the key quantity for developing rotation-dependent theoretical frameworks that better explain its empirical success.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Adam优化器参数空间旋转理论框架性能研究

相关文章

在推特上看到一个大v表示自己逐字逐句读过《易经》《道德经》《孙子兵法》等，他表示如果可以重来，他不会读这些书，也不会让自己的孩子读，因为这些书没系统的...

Adam Optimizer Causes Privileged Basis in Transformer Language Models

Unraveling Transformer Optimization: A Hessian-Based Explanation for Adam’s Superiority over SGD

Model Collapse in the Synthetic Data Era: Analytical Insights and Mitigation Strategies

Understanding Memorization in Diffusion Models: A Statistical Physics Approach to Manifold-Supported Data

朱松纯：我们被美国的AI叙事带偏了盲目跟随难以创新

刚刚，ICLR 2025时间检验奖颁给Adam之父，Bengio「注意力机制」摘亚军

刚刚，ICLR 2025时间检验奖颁给Adam之父！Bengio「注意力机制」摘亚军

Adam获时间检验奖！清华揭示保辛动力学本质，提出全新RAD优化器

Adam获时间检验奖！清华揭示保辛动力学本质，提出全新RAD优化器