热点
关于我们
xx
xx
"
Sandwiched Policy Gradient
" 相关文章
扩散语言模型也能强化学习?Meta田渊栋团队用“三明治梯度”打通RL闭环
PaperWeekly
2025-10-21T05:27:14.000000Z
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
cs.AI updates on arXiv.org
2025-10-13T04:14:41.000000Z
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
cs.AI updates on arXiv.org
2025-10-13T04:14:41.000000Z
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
cs.AI updates on arXiv.org
2025-10-13T04:14:41.000000Z