热点
"奖励建模" 相关文章
RL微调,关键在前10%奖励!基于评分准则,Scale AI等提出新方法
新智元 2025-10-16T21:05:15.000000Z
RL微调,关键在前10%奖励!基于评分准则,Scale AI等提出新方法
新智元 2025-10-16T21:05:15.000000Z
From Scores to Preferences: Redefining MOS Benchmarking for Speech Quality Reward Modeling
cs.AI updates on arXiv.org 2025-10-02T04:18:20.000000Z
RewardDance:字节跳动提出视觉生成奖励扩展新范式,破解“奖励劫持”难题
我爱计算机视觉 2025-09-12T12:46:40.000000Z
RewardDance:字节跳动提出视觉生成奖励扩展新范式,破解“奖励劫持”难题
我爱计算机视觉 2025-09-11T17:12:15.000000Z
ICML 2025 | 奖励模型还用人标?APEC用对抗模仿生成偏好,泛化能力直线上升
PaperWeekly 2025-08-13T16:23:08.000000Z
从打分器到思考者:RM-R1用推理重塑模型价值判断
机器之心 2025-05-31T08:21:30.000000Z
DeepSeek R2来了?全新推理时Scaling论文联手清华震撼发布!
华尔街见闻 - 最热文章 2025-04-05T02:42:35.000000Z
This AI Paper Introduces Agentic Reward Modeling (ARM) and REWARDAGENT: A Hybrid AI Approach Combining Human Preferences and Verifiable Correctness for Reliable LLM Training
MarkTechPost@AI 2025-03-01T05:16:07.000000Z
Tips for LLM Pretraining and Evaluating Reward Models
Ahead of AI 2024-10-22T06:07:40.000000Z
My disagreements with "AGI ruin: A List of Lethalities"
少点错误 2024-09-15T17:22:44.000000Z