新型LLM对齐方法显著降低偏好扭曲

cs.AI updates on arXiv.org 10月29日 12:17

新型LLM对齐方法显著降低偏好扭曲

本文提出一种名为符号估计器的新方法，通过在聚合步骤中用二元分类损失代替交叉熵，有效降低LLM对齐中的偏好扭曲，提高社会福利估计的准确性。

arXiv:2510.23965v1 Announce Type: new Abstract: Traditional LLM alignment methods are vulnerable to heterogeneity in human preferences. Fitting a na\"ive probabilistic model to pairwise comparison data (say over prompt-completion pairs) yields an inconsistent estimate of the population-average utility -a canonical measure of social welfare. We propose a new method, dubbed the sign estimator, that provides a simple, provably consistent, and efficient estimator by replacing cross-entropy with binary classification loss in the aggregation step. This simple modification recovers consistent ordinal alignment under mild assumptions and achieves the first polynomial finite-sample error bounds in this setting. In realistic simulations of LLM alignment using digital twins, the sign estimator substantially reduces preference distortion over a panel of simulated personas, cutting (angular) estimation error by nearly 35% and decreasing disagreement with true population preferences from 12% to 8% compared to standard RLHF. Our method also compares favorably to panel data heuristics that explicitly model user heterogeneity and require tracking individual-level preference data-all while maintaining the implementation simplicity of existing LLM alignment pipelines.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM对齐符号估计器偏好扭曲社会福利二元分类

相关文章

美国两党就移民持续博弈，马斯克下场站队

免费货币？南非提出全民基本收入计划

埃及新政府公布未来三年计划，首个财年经济增速目标4.2%

回复@冬冬驹: 我手里买了好多无人驾驶的股你让我现在黑我很难黑的下去嘴……所以你这么问我只能说：按照正常的社会福利制度，无人驾驶技术进步，导致像百度这样...

一、先说事情：其实，太阳底下没什么新鲜事。2015年5月和8月，武汉出租车就曾集体BG，抵制网约车。结果无济于事。才不到两年，武汉网约车就猛增到30多万辆，其中...

Basic Income, Not Basic Jobs: Against Hijacking Utopia

瑞典祖父母可通过照顾孙辈获得补贴

OpenResearch reveals potential impacts of universal basic income

回复@仓又加错-刘成岗: 正好在学总的群里看到这篇，作为看完这篇文章后的补充。来源：阑夕。OpenAI的老板Sam Altman是全民基本收入（UBI）的倡导者，他用3年时间...