cs.AI updates on arXiv.org 10月21日 12:28
自动调整拉格朗日乘子的鲁棒性分析
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了在安全强化学习中拉格朗日乘子的优化问题,分析了自动更新乘子的鲁棒性和稳定性,并通过实验展示了其在优化任务中的表现。

arXiv:2510.17564v1 Announce Type: cross Abstract: In safety-critical domains such as robotics, navigation and power systems, constrained optimization problems arise where maximizing performance must be carefully balanced with associated constraints. Safe reinforcement learning provides a framework to address these challenges, with Lagrangian methods being a popular choice. However, the effectiveness of Lagrangian methods crucially depends on the choice of the Lagrange multiplier $\lambda$, which governs the trade-off between return and constraint cost. A common approach is to update the multiplier automatically during training. Although this is standard in practice, there remains limited empirical evidence on the robustness of an automated update and its influence on overall performance. Therefore, we analyze (i) optimality and (ii) stability of Lagrange multipliers in safe reinforcement learning across a range of tasks. We provide $\lambda$-profiles that give a complete visualization of the trade-off between return and constraint cost of the optimization problem. These profiles show the highly sensitive nature of $\lambda$ and moreover confirm the lack of general intuition for choosing the optimal value $\lambda^$. Our findings additionally show that automated multiplier updates are able to recover and sometimes even exceed the optimal performance found at $\lambda^$ due to the vast difference in their learning trajectories. Furthermore, we show that automated multiplier updates exhibit oscillatory behavior during training, which can be mitigated through PID-controlled updates. However, this method requires careful tuning to achieve consistently better performance across tasks. This highlights the need for further research on stabilizing Lagrangian methods in safe reinforcement learning. The code used to reproduce our results can be found at https://github.com/lindsayspoor/Lagrangian_SafeRL.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

安全强化学习 拉格朗日乘子 优化问题
相关文章