热点
关于我们
xx
xx
"
ScaleRL
" 相关文章
How to scale RL
Interconnects
2025-10-20T16:39:57.000000Z
Meta用40万个GPU小时做了一个实验,只为弄清强化学习Scaling Law
机器之心
2025-10-20T14:24:54.000000Z
Meta用40万个GPU小时做了一个实验,只为弄清强化学习Scaling Law
机器之心
2025-10-20T14:24:54.000000Z
Meta花了420万美元、烧掉40万GPU·小时,只为验证一条Sigmoid曲线
PaperWeekly
2025-10-19T08:34:29.000000Z
Sigmoidal Scaling Curves Make Reinforcement Learning RL Post-Training Predictable for LLMs
MarkTechPost@AI
2025-10-18T02:42:51.000000Z
Sigmoidal Scaling Curves Make Reinforcement Learning RL Post-Training Predictable for LLMs
MarkTechPost@AI
2025-10-18T02:42:51.000000Z
Sigmoidal Scaling Curves Make Reinforcement Learning RL Post-Training Predictable for LLMs
MarkTechPost@AI
2025-10-18T02:42:51.000000Z
Sigmoidal Scaling Curves Make Reinforcement Learning RL Post-Training Predictable for LLMs
MarkTechPost@AI
2025-10-18T02:42:51.000000Z
Meta花了420万美元、烧掉40万GPU·小时,只为验证一条Sigmoid曲线
PaperWeekly
2025-10-17T16:47:03.000000Z
Meta花了420万美元、烧掉40万GPU·小时,只为验证一条Sigmoid曲线
PaperWeekly
2025-10-17T16:47:03.000000Z