热点
"偏好对齐" 相关文章
SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks
cs.AI updates on arXiv.org 2025-11-03T05:20:10.000000Z
Fine-tuning Large Language Models with Limited Data: A Survey and Practical Guide
cs.AI updates on arXiv.org 2025-10-29T04:20:22.000000Z
Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization
cs.AI updates on arXiv.org 2025-10-08T04:10:49.000000Z
A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling
cs.AI updates on arXiv.org 2025-10-07T04:16:20.000000Z
Pluralistic Off-policy Evaluation and Alignment
cs.AI updates on arXiv.org 2025-09-25T05:34:12.000000Z
OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination
cs.AI updates on arXiv.org 2025-09-03T04:16:34.000000Z
我们都错怪GPT-5了,路由统一算力,免费用户也能创造收益
智源社区 2025-08-15T08:16:41.000000Z
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
cs.AI updates on arXiv.org 2025-07-30T04:12:01.000000Z
PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training
cs.AI updates on arXiv.org 2025-07-29T04:21:34.000000Z
PIPA: Preference Alignment as Prior-Informed Statistical Estimation
cs.AI updates on arXiv.org 2025-07-28T04:43:06.000000Z
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
cs.AI updates on arXiv.org 2025-07-24T05:31:34.000000Z
扩散语言模型扛把子LLaDA迎来新版本,数学、代码、对齐能力均提升
机器之心 2025-06-07T07:11:41.000000Z
ICML 2025 | RLHF太贵太慢?TPO即时对齐新方案,一句话指令搞定偏好优化
PaperWeekly 2025-05-21T06:12:30.000000Z
把RLHF带给VLA模型!通过偏好对齐来优化机器人策略,代码已开源
机器之心 2024-12-27T08:09:02.000000Z
Meet GRAPE: A Plug-and-Play Algorithm to Generalize Robot Policies via Preference Alignment
MarkTechPost@AI 2024-12-08T07:49:27.000000Z
多图场景用DPO对齐!上海AI实验室等提出新方法,无需人工标注
智源社区 2024-11-02T10:53:35.000000Z
多图场景用DPO对齐,上海AI实验室等提出新方法,无需人工标注
36kr 2024-11-01T12:03:56.000000Z