cs.AI updates on arXiv.org 07月09日
2048: Reinforcement Learning in a Delayed Reward Environment
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章提出一种多步强化学习框架H-DQN,在2048游戏上实现高分,展示其在稀疏奖励环境中的性能提升潜力。

arXiv:2507.05465v1 Announce Type: cross Abstract: Delayed and sparse rewards present a fundamental obstacle for reinforcement-learning (RL) agents, which struggle to assign credit for actions whose benefits emerge many steps later. The sliding-tile game 2048 epitomizes this challenge: although frequent small score changes yield immediate feedback, they often mislead agents into locally optimal but globally suboptimal strategies. In this work, we introduce a unified, distributional multi-step RL framework designed to directly optimize long-horizon performance. Using the open source Gym-2048 environment we develop and compare four agent variants: standard DQN, PPO, QR-DQN (Quantile Regression DQN), and a novel Horizon-DQN (H-DQN) that integrates distributional learning, dueling architectures, noisy networks, prioritized replay, and more. Empirical evaluation reveals a clear hierarchy in effectiveness: max episode scores improve from 3.988K (DQN) to 5.756K (PPO), 8.66K (QR-DQN), and 18.21K (H-DQN), with H-DQN reaching the 2048 tile. Upon scaling H-DQN it reaches a max score 41.828K and a 4096 tile. These results demonstrate that distributional, multi-step targets substantially enhance performance in sparse-reward domains, and they suggest promising avenues for further gains through model-based planning and curriculum learning.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

强化学习 H-DQN 2048游戏 性能提升
相关文章