偏好学习_Fishai

热点

"偏好学习" 相关文章

CURATRON: Complete and Robust Preference Data for Rigorous Alignment of Large Language Models

cs.AI updates on arXiv.org 2025-10-30T04:21:54.000000Z

Preference Learning with Response Time: Robust Losses and Guarantees

cs.AI updates on arXiv.org 2025-10-29T04:20:25.000000Z

Representer Theorems for Metric and Preference Learning: Geometric Insights and Algorithms

cs.AI updates on arXiv.org 2025-10-28T04:14:37.000000Z

Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options

cs.AI updates on arXiv.org 2025-10-22T04:24:54.000000Z

Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

cs.AI updates on arXiv.org 2025-10-20T04:09:37.000000Z

RealDPO: Real or Not Real, that is the Preference

cs.AI updates on arXiv.org 2025-10-17T04:19:13.000000Z

Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

cs.AI updates on arXiv.org 2025-10-17T04:18:42.000000Z

Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

cs.AI updates on arXiv.org 2025-10-17T04:18:42.000000Z

Repairing Reward Functions with Human Feedback to Mitigate Reward Hacking

cs.AI updates on arXiv.org 2025-10-16T04:21:06.000000Z

Repairing Reward Functions with Human Feedback to Mitigate Reward Hacking

cs.AI updates on arXiv.org 2025-10-16T04:21:06.000000Z

大模型不会用工具？人大Tool-Light：不存在的！

PaperAgent 2025-10-09T04:23:36.000000Z

Reward Model Perspectives: Whose Opinions Do Reward Models Reward?

cs.AI updates on arXiv.org 2025-10-09T04:07:09.000000Z

Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents

cs.AI updates on arXiv.org 2025-10-07T04:12:02.000000Z

Annotation-Efficient Language Model Alignment via Diverse and Representative Response Texts

cs.AI updates on arXiv.org 2025-09-18T04:55:50.000000Z

Learning to Plan with Personalized Preferences

cs.AI updates on arXiv.org 2025-09-15T08:35:11.000000Z

Temporal Preference Optimization for Long-Form Video Understanding

cs.AI updates on arXiv.org 2025-09-03T04:18:10.000000Z

Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future

cs.AI updates on arXiv.org 2025-08-11T04:08:41.000000Z

将偏好学习引入模型训练，北大李戈团队新框架，可显著提升代码准确性与执行效率

智源社区 2024-11-28T13:53:29.000000Z

Evaluation is All You Need！首个开源多模态大模型通用评测器LLaVA-Critic

机器之心 2024-10-14T07:12:09.000000Z

Copyright © 2019 FISHAI.All Rights Reserved