热点
"过程奖励" 相关文章
Language Server CLI Empowers Language Agents with Process Rewards
cs.AI updates on arXiv.org 2025-10-28T04:14:34.000000Z
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
cs.AI updates on arXiv.org 2025-10-10T04:12:47.000000Z
攻克AI过度思考难题!美团新研究让通过“可验证”过程奖励激活LRM的高效推理
智源社区 2025-09-12T13:23:03.000000Z
过程监督>结果监督!华为港城重构RAG推理训练,5k样本性能反超90k模型
PaperWeekly 2025-06-03T06:42:32.000000Z
PRIME: An Open-Source Solution for Online Reinforcement Learning with Process Rewards to Advance Reasoning Abilities of Language Models Beyond Imitation or Distillation
MarkTechPost@AI 2025-01-05T02:45:09.000000Z
Revolutionizing LLM Alignment: A Deep Dive into Direct Q-Function Optimization
MarkTechPost@AI 2024-12-31T06:19:48.000000Z
过程奖励模型PRM成版本答案!谷歌DeepMind全自动标注逐步骤奖励PAV,准确率提升8%
智源社区 2024-11-17T11:52:12.000000Z
ReST-MCTS*!强化自训练,让大模型持续「升级」
GLM大模型 2024-11-05T10:10:45.000000Z