热点
"Sandwiched Policy Gradient" 相关文章
扩散语言模型也能强化学习?Meta田渊栋团队用“三明治梯度”打通RL闭环
PaperWeekly 2025-10-21T05:27:14.000000Z
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
cs.AI updates on arXiv.org 2025-10-13T04:14:41.000000Z
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
cs.AI updates on arXiv.org 2025-10-13T04:14:41.000000Z
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
cs.AI updates on arXiv.org 2025-10-13T04:14:41.000000Z