cs.AI updates on arXiv.org 10月21日 12:29
CPT-RL:累积前景理论在强化学习中的应用
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了累积前景理论(CPT)在强化学习中的应用,提出了新的策略梯度定理和模型无关的策略梯度算法,并通过仿真验证了算法的有效性。

arXiv:2410.02605v3 Announce Type: replace-cross Abstract: Classical reinforcement learning (RL) typically assumes rational decision-making based on expected utility theory. However, this model has been shown to be empirically inconsistent with actual human preferences, as evidenced in psychology and behavioral economics. Cumulative Prospect Theory (CPT) provides a more nuanced model for human-based decision-making, capturing diverse attitudes and perceptions toward risk, gains, and losses. While prior work has integrated CPT with RL to solve CPT policy optimization problems, the understanding and impact of this formulation remain limited. Our contributions are as follows: (a) we derive a novel policy gradient theorem for CPT objectives, generalizing the foundational result in standard RL, (b) we design a model-free policy gradient algorithm for solving the CPT-RL problem, (c) we analyze our policy gradient estimator and prove asymptotic convergence of the algorithm to first-order stationary points, and (d) test its performance through simulations. Notably, our first-order policy gradient algorithm scales better than existing zeroth-order methods to larger state spaces. Our theoretical framework offers more flexibility to advance the integration of behavioral decision-making into RL.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

累积前景理论 强化学习 策略梯度 模型无关算法
相关文章