热点
"偏好优化" 相关文章
Semi-Supervised Preference Optimization with Limited Feedback
cs.AI updates on arXiv.org 2025-11-05T05:16:48.000000Z
Meta-Learning Objectives for Preference Optimization
cs.AI updates on arXiv.org 2025-10-30T04:23:21.000000Z
RePO: Understanding Preference Learning Through ReLU-Based Optimization
cs.AI updates on arXiv.org 2025-10-28T04:14:38.000000Z
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
cs.AI updates on arXiv.org 2025-10-27T06:31:17.000000Z
ADPO: Anchored Direct Preference Optimization
cs.AI updates on arXiv.org 2025-10-23T04:13:18.000000Z
Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization
cs.AI updates on arXiv.org 2025-10-10T04:17:31.000000Z
Provably Mitigating Corruption, Overoptimization, and Verbosity Simultaneously in Offline and Online RLHF/DPO Alignment
cs.AI updates on arXiv.org 2025-10-08T04:12:10.000000Z
Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization
cs.AI updates on arXiv.org 2025-10-08T04:10:49.000000Z
From Noisy Traces to Stable Gradients: Bias-Variance Optimized Preference Optimization for Aligning Large Reasoning Models
cs.AI updates on arXiv.org 2025-10-07T04:18:14.000000Z
From Noisy Traces to Stable Gradients: Bias-Variance Optimized Preference Optimization for Aligning Large Reasoning Models
cs.AI updates on arXiv.org 2025-10-07T04:18:14.000000Z
Emotion-Aligned Generation in Diffusion Text to Speech Models via Preference-Guided Optimization
cs.AI updates on arXiv.org 2025-10-01T06:00:20.000000Z
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
cs.AI updates on arXiv.org 2025-09-30T04:04:30.000000Z
Adaptive Margin RLHF via Preference over Preferences
cs.AI updates on arXiv.org 2025-09-30T04:03:35.000000Z
USB-Rec: An Effective Framework for Improving Conversational Recommendation Capability of Large Language Model
cs.AI updates on arXiv.org 2025-09-26T04:20:47.000000Z
LifeAlign: Lifelong Alignment for Large Language Models with Memory-Augmented Focalized Preference Optimization
cs.AI updates on arXiv.org 2025-09-23T05:57:30.000000Z
TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning
cs.AI updates on arXiv.org 2025-09-18T04:51:12.000000Z
The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features
cs.AI updates on arXiv.org 2025-09-17T04:52:49.000000Z
Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization
cs.AI updates on arXiv.org 2025-09-17T04:46:33.000000Z
TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making
cs.AI updates on arXiv.org 2025-09-11T15:51:25.000000Z
Selective Preference Optimization via Token-Level Reward Function Estimation
cs.AI updates on arXiv.org 2025-09-08T04:51:58.000000Z