热点
关于我们
xx
xx
"
中期训练
" 相关文章
RA3: Mid-Training with Temporal Action Abstractions for Faster Reinforcement Learning (RL) Post-Training in Code LLMs
MarkTechPost@AI
2025-10-09T06:24:24.000000Z
RA3: Mid-Training with Temporal Action Abstractions for Faster Reinforcement Learning (RL) Post-Training in Code LLMs
MarkTechPost@AI
2025-10-09T06:24:24.000000Z
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
cs.AI updates on arXiv.org
2025-10-01T06:01:01.000000Z
RL不只Qwen玩得转!“中期训练”让Llama一夜进化,OctoThinker横空出世
PaperWeekly
2025-07-01T12:03:48.000000Z