热点
关于我们
xx
xx
"
离策略强化学习
" 相关文章
RL without TD learning
The Berkeley Artificial Intelligence Research Blog
2025-11-07T07:20:30.000000Z
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
cs.AI updates on arXiv.org
2025-09-30T04:06:21.000000Z
Functional Critic Modeling for Provably Convergent Off-Policy Actor-Critic
cs.AI updates on arXiv.org
2025-09-30T04:03:46.000000Z