基于控制框架的LLM对齐研究

cs.AI updates on arXiv.org 11月06日 13:10

基于控制框架的LLM对齐研究

本文提出一种利用控制屏障函数（CBF）确保用户期望文本生成的基于控制框架的大语言模型（LLM）对齐方法。该方法将CBF安全过滤器应用于基线LLM生成的预测标记，以干预生成文本。安全过滤器具有两个显著优势：作为附加类型，无需微调基线LLM即可用于对齐目的；若存在关于期望对齐的评估模型，可直接应用于过滤器设计。整个文本生成系统采用开源语言模型实现，旨在生成积极文本。

arXiv:2511.03121v1 Announce Type: cross Abstract: This paper proposes a control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation. The presented framework applies the CBF safety filter to the predicted token generated from the baseline LLM, to intervene in the generated text. The safety filter includes two significant advantages: this safety filter is an add-on type, allowing it to be used for alignment purposes without fine-tuning the baseline LLM, and if there is an evaluation model regarding the desired alignment, it can be directly applied to the filter design. The overall text-generation system is implemented with open-source language models, aiming to generate positive text.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签