LLM偏好建模框架与鲁棒性研究

cs.AI updates on arXiv.org 10月31日 12:01

LLM偏好建模框架与鲁棒性研究

本文提出一种基于人机交互的LLM偏好建模框架，通过学习多标准条件下的判断输出，实现大规模偏好标签合成。同时，通过案例研究评估方法在人类与LLM判断者偏见上的鲁棒性。

arXiv:2510.25884v1 Announce Type: new Abstract: Aligning LLM-based judges with human preferences is a significant challenge, as they are difficult to calibrate and often suffer from rubric sensitivity, bias, and instability. Overcoming this challenge advances key applications, such as creating reliable reward models for Reinforcement Learning from Human Feedback (RLHF) and building effective routing systems that select the best-suited model for a given user query. In this work, we propose a framework for modeling diverse, persona-based preferences by learning to aggregate outputs from multiple rubric-conditioned judges. We investigate the performance of this approach against naive baselines and assess its robustness through case studies on both human and LLM-judges biases. Our primary contributions include a persona-based method for synthesizing preference labels at scale and two distinct implementations of our aggregator: Generalized Additive Model (GAM) and a Multi-Layer Perceptron (MLP).

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 偏好建模鲁棒性案例研究人机交互

相关文章

Import AI 368: 500% faster local LLMs; 38X more efficient red teaming; AI21’s Frankenmodel

Learn AI Together — Towards AI Community Newsletter #23

This AI newsletter is all you need #98

Patterns and Middleware for LLM Applications with Kyle Roche - #659

Building LLM-Based Applications with Azure OpenAI with Jay Emery - #657

Mental Models for Advanced ChatGPT Prompting with Riley Goodside - #652

Studying Machine Intelligence with Been Kim - #571

Fairness and Robustness in Federated Learning with Virginia Smith -#504

The Future of Human-Machine Interaction with Dan Bohus and Siddhartha Sen - #499

How to Be Human in the Age of AI with Ayanna Howard - #460