偏好对齐_Fishai

热点

"偏好对齐" 相关文章

SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks

cs.AI updates on arXiv.org 2025-11-03T05:20:10.000000Z

Fine-tuning Large Language Models with Limited Data: A Survey and Practical Guide

cs.AI updates on arXiv.org 2025-10-29T04:20:22.000000Z

Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization

cs.AI updates on arXiv.org 2025-10-08T04:10:49.000000Z

A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling

cs.AI updates on arXiv.org 2025-10-07T04:16:20.000000Z

Pluralistic Off-policy Evaluation and Alignment

cs.AI updates on arXiv.org 2025-09-25T05:34:12.000000Z

OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination

cs.AI updates on arXiv.org 2025-09-03T04:16:34.000000Z

我们都错怪GPT-5了，路由统一算力，免费用户也能创造收益

智源社区 2025-08-15T08:16:41.000000Z

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

cs.AI updates on arXiv.org 2025-07-30T04:12:01.000000Z

PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training

cs.AI updates on arXiv.org 2025-07-29T04:21:34.000000Z

PIPA: Preference Alignment as Prior-Informed Statistical Estimation

cs.AI updates on arXiv.org 2025-07-28T04:43:06.000000Z

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance

cs.AI updates on arXiv.org 2025-07-24T05:31:34.000000Z

扩散语言模型扛把子LLaDA迎来新版本，数学、代码、对齐能力均提升

机器之心 2025-06-07T07:11:41.000000Z

ICML 2025 | RLHF太贵太慢？TPO即时对齐新方案，一句话指令搞定偏好优化

PaperWeekly 2025-05-21T06:12:30.000000Z

把RLHF带给VLA模型！通过偏好对齐来优化机器人策略，代码已开源

机器之心 2024-12-27T08:09:02.000000Z

Meet GRAPE: A Plug-and-Play Algorithm to Generalize Robot Policies via Preference Alignment

MarkTechPost@AI 2024-12-08T07:49:27.000000Z

多图场景用DPO对齐！上海AI实验室等提出新方法，无需人工标注

智源社区 2024-11-02T10:53:35.000000Z

多图场景用DPO对齐，上海AI实验室等提出新方法，无需人工标注

36kr 2024-11-01T12:03:56.000000Z

Copyright © 2019 FISHAI.All Rights Reserved