偏好对齐新框架提升模型可靠性

cs.AI updates on arXiv.org 10月07日

偏好对齐新框架提升模型可靠性

本文提出一种新的数据收集和建模框架，通过引入外部选项增强偏好数据，训练出能够区分‘更好’和‘足够好’的奖励模型。实验表明，该方法能有效降低可靠性失败率，提高推理速度。

arXiv:2510.04087v1 Announce Type: cross Abstract: Modern preference alignment techniques, such as Best-of-N (BoN) sampling, rely on reward models trained with pairwise comparison data. While effective at learning relative preferences, this paradigm fails to capture a signal of response acceptability, leaving systems vulnerable to selecting the least bad of many unacceptable options. This is particularly problematic for hard prompts, where the risk of such false acceptances increases with the number of samples. In this paper, we address this critical reliability gap by introducing a new data collection and modeling framework. By augmenting preference data with an outside option, inspired by discrete choice models, we train a reward model that can distinguish not just what is \textit{better}, but what is \textit{good enough}. We leverage this capability to create an adaptive inference strategy, best of mini-N in-loop, which partitions the generation budget into sequential loops with a calibrated, early-exit condition. Our experiments show that when tuned as an alignment guardrail, it reduces reliability failures by 70\%, and when tuned as an inference accelerator, it improves average inference speed by over 22\% in IMDB-sentiment setting. We thus provide a principled and flexible framework for practitioners to explicitly manage the trade-off between reliability and computational efficiency.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

偏好对齐数据收集奖励模型可靠性推理速度

相关文章

Redundancy in AI: A Hybrid Convolutional Neural Networks CNN Approach to Minimize Computational Overhead in Reliable Execution

NVIDIA AI Releases the TensorRT Model Optimizer: A Library to Quantize and Compress Deep Learning Models for Optimized Inference on GPUs

Programmatic Labeling and Data Scaling for Autonomous Commercial Aviation with Cedric Cocaud - #601

Show HN: 等待 Miko 表格 - 通过对话收集数据

@家熙Panda 是我认识最擅长数据处理的朋友，也是 ChatLaw 的作者！从来没见过这么高质量的讲解⬇️ LLM 数据工程的爬虫解析：https://mp.weixin.qq.com/s/XpAhQ...

Show HN: PgQueuer - 利用 PostgreSQL 处理超过 6k 个工作/SEC

benchexec： BenchExec：可靠的基准测试和资源测量框架

斯隆的收件箱您应该信任使用 Light Mode 的开发人员吗？?

Hamming AI: An AI Startup that Provides Fastest Way to Make Your Prompts, RAG, and AI Agents More Reliable

This AI Paper from China Proposes a Novel dReLU-based Sparsification Method that Increases Model Sparsity to 90% while Maintaining Performance, Achieving a 2-5× Speedup in Inference