中期训练_Fishai

热点

"中期训练" 相关文章

RA3: Mid-Training with Temporal Action Abstractions for Faster Reinforcement Learning (RL) Post-Training in Code LLMs

MarkTechPost@AI 2025-10-09T06:24:24.000000Z

RA3: Mid-Training with Temporal Action Abstractions for Faster Reinforcement Learning (RL) Post-Training in Code LLMs

MarkTechPost@AI 2025-10-09T06:24:24.000000Z

Learning to Reason as Action Abstractions with Scalable Mid-Training RL

cs.AI updates on arXiv.org 2025-10-01T06:01:01.000000Z

RL不只Qwen玩得转！“中期训练”让Llama一夜进化，OctoThinker横空出世

PaperWeekly 2025-07-01T12:03:48.000000Z

Copyright © 2019 FISHAI.All Rights Reserved