热点
关于我们
xx
xx
"
探索-利用权衡
" 相关文章
Neighboring State-based Exploration for Reinforcement Learning
cs.AI updates on arXiv.org
2025-11-05T05:31:28.000000Z
Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning
cs.AI updates on arXiv.org
2025-10-14T04:09:21.000000Z
Should You Use Your Large Language Model to Explore or Exploit?
cs.AI updates on arXiv.org
2025-10-01T06:02:32.000000Z
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
cs.AI updates on arXiv.org
2025-09-29T04:16:37.000000Z
The Multi-Armed Bandit Problem and Its Solutions
Lil'Log
2025-09-25T10:02:22.000000Z