热点
"RL训练" 相关文章
如果RL可预测,我们还需要把训练跑满吗?中科大揭示参数更新的线性秘密
PaperWeekly 2025-10-14T14:42:26.000000Z
如果RL可预测,我们还需要把训练跑满吗?中科大揭示参数更新的线性秘密
PaperWeekly 2025-10-14T14:42:26.000000Z
Guiding Exploration in Reinforcement Learning Through LLM-Augmented Observations
cs.AI updates on arXiv.org 2025-10-13T04:13:19.000000Z
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
cs.AI updates on arXiv.org 2025-08-20T04:17:09.000000Z
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
cs.AI updates on arXiv.org 2025-07-04T04:08:34.000000Z
解密prompt系列51. R1实验的一些细节讨论
掘金 人工智能 2025-04-02T23:42:45.000000Z
一句话让DeepSeek思考停不下来 北大团队:这是针对AI的DDoS攻击
快科技资讯 2025-03-04T11:29:47.000000Z
一句话让DeepSeek思考停不下来,又有人攻击AI了
虎嗅-AI 2025-03-02T03:22:35.000000Z
一句话让DeepSeek思考停不下来,北大团队:这是针对AI的DDoS攻击
智源社区 2025-03-01T09:07:15.000000Z
Linguistic Imperialism in AI: Enforcing Human-Readable Chain-of-Thought
少点错误 2025-02-21T15:49:46.000000Z
Kimi官方复盘:k1.5复现o1的思考过程
Founder Park 2025-01-23T17:14:55.000000Z
Quick recap on the state of reasoning
Interconnects 2025-01-02T16:05:53.000000Z