Is Exploration or Optimization the Problem for Deep Reinforcement Learning?

cs.AI updates on arXiv.org 08月05日

Is Exploration or Optimization the Problem for Deep Reinforcement Learning?

本文提出一种新的深强化学习算法优化难题评估方法，通过实验证明当前深强化学习仅利用了生成经验的半数，为改进算法提供参考。

arXiv:2508.01329v1 Announce Type: cross Abstract: In the era of deep reinforcement learning, making progress is more complex, as the collected experience must be compressed into a deep model for future exploitation and sampling. Many papers have shown that training a deep learning policy under the changing state and action distribution leads to sub-optimal performance, or even collapse. This naturally leads to the concern that even if the community creates improved exploration algorithms or reward objectives, will those improvements fall on the \textit{deaf ears} of optimization difficulties. This work proposes a new \textit{practical} sub-optimality estimator to determine optimization limitations of deep reinforcement learning algorithms. Through experiments across environments and RL algorithms, it is shown that the difference between the best experience generated is 2-3$\times$ better than the policies' learned performance. This large difference indicates that deep RL methods only exploit half of the good experience they generate.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

深强化学习优化难题算法评估

相关文章

J. Cheminform. | 用最短哈密顿环度量分子多样性

Gym4ReaL: A Suite for Benchmarking Real-World Reinforcement Learning

ClustOpt: A Clustering-based Approach for Representing and Visualizing the Search Dynamics of Numerical Metaheuristic Optimization Algorithms

Deep Disentangled Representation Network for Treatment Effect Estimation

Are Vision Foundation Models Ready for Out-of-the-Box Medical Image Registration?

BenchRL-QAS: Benchmarking reinforcement learning algorithms for quantum architecture search

Online Submission and Evaluation System Design for Competition Operations

Evaluating Ensemble and Deep Learning Models for Static Malware Detection with Dimensionality Reduction Using the EMBER Dataset

Argumentative Debates for Transparent Bias Detection [Technical Report]

Advancing MAPF towards the Real World: A Scalable Multi-Agent Realistic Testbed (SMART)