探索-利用权衡_Fishai

热点

"探索-利用权衡" 相关文章

Neighboring State-based Exploration for Reinforcement Learning

cs.AI updates on arXiv.org 2025-11-05T05:31:28.000000Z

Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning

cs.AI updates on arXiv.org 2025-10-14T04:09:21.000000Z

Should You Use Your Large Language Model to Explore or Exploit?

cs.AI updates on arXiv.org 2025-10-01T06:02:32.000000Z

Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

cs.AI updates on arXiv.org 2025-09-29T04:16:37.000000Z

The Multi-Armed Bandit Problem and Its Solutions

Lil'Log 2025-09-25T10:02:22.000000Z

Copyright © 2019 FISHAI.All Rights Reserved