cs.AI updates on arXiv.org 10月14日 12:20
DeAL框架提升LLM对齐效果
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出DeAL框架,允许用户自定义奖励函数,实现LLM解码时对齐。实验证明,DeAL在细化权衡和遵循对齐目标方面有显著优势,并可与其他对齐策略有效结合。

arXiv:2402.06147v3 Announce Type: replace Abstract: Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF). However, it is unclear if such methods are an effective choice to teach alignment objectives to the model. First, the inability to incorporate multiple, custom rewards and reliance on a model developer's view of universal and static principles are key limitations. Second, the reliability of such approaches is also questionable (e.g. susceptibility to jailbreaking even after safety training). To address these issues, we propose DeAL, a framework that allows the user to customize reward functions and enables Decoding-time Alignment of LLMs (DeAL). At its core, we view decoding as a heuristic-guided search process and facilitate the use of a wide variety of alignment objectives. Our experiments with programmatic constraints such as keyword and length constraints, and abstract objectives such as harmlessness and helpfulness, show that we can DeAL with fine-grained trade-offs and improve adherence to alignment objectives. Lastly, we demonstrate that DeAL is largely complementary to existing alignment strategies, and can be effectively paired with RLHF and prompting techniques to achieve better alignment.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DeAL框架 LLM对齐 奖励函数 解码时对齐
相关文章