热点
"过程监督" 相关文章
清华、快手提出AttnRL:让大模型用「注意力」探索
机器之心 2025-10-21T14:51:01.000000Z
COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes
cs.AI updates on arXiv.org 2025-10-17T04:18:59.000000Z
Learning a Dense Reasoning Reward Model from Expert Demonstration via Inverse Reinforcement Learning
cs.AI updates on arXiv.org 2025-10-03T04:07:45.000000Z
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
cs.AI updates on arXiv.org 2025-09-30T04:02:26.000000Z
​​🚫万能Agent兜底:当规划缺失工具时,AI如何自救​
掘金 人工智能 2025-09-15T07:47:56.000000Z
SSPO: Self-traced Step-wise Preference Optimization for Process Supervision and Reasoning Compression
cs.AI updates on arXiv.org 2025-08-19T04:21:32.000000Z
奥特曼:ChatGPT只是意外,全能AI智能体才是真爱!Karpathy:7年前就想到了
智源社区 2025-08-05T14:11:53.000000Z
RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
cs.AI updates on arXiv.org 2025-07-31T04:48:18.000000Z
北大团队提出LongRePS框架:面向长上下文场景的思维链过程监督方案
PaperWeekly 2025-03-13T12:17:53.000000Z
Outcome-Refining Process Supervision: Advancing Code Generation with Structured Reasoning and Execution Feedback
MarkTechPost@AI 2025-01-14T17:49:56.000000Z
Researchers from SynthLabs and Stanford Propose Meta Chain-of-Thought (Meta-CoT): An AI Framework for Improving LLM Reasoning
MarkTechPost@AI 2025-01-09T03:42:34.000000Z
Google DeepMind Researchers Propose a Novel Divide-and-Conquer Style Monte Carlo Tree Search (MCTS) Algorithm ‘OmegaPRM’ for Efficiently Collecting High-Quality Process Supervision Data
MarkTechPost@AI 2024-06-16T09:31:36.000000Z