cs.AI updates on arXiv.org 10月16日
RLHF与KL正则化在隐私保护下的研究
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文研究了基于KL正则化的强化学习从人类反馈(RLHF)在ε局部差分隐私(ε-LDP)模型下的离线和在线设置。在离线设置中,我们设计了一种基于悲观主义的算法,并在单策略集中性下推导出KL正则化目标的新次优性差距。在线设置中,我们首次从理论上研究了具有LDP的KL正则化RLHF问题,并设计了一种基于乐观主义的算法。

arXiv:2510.13512v1 Announce Type: cross Abstract: In this paper, we study the offline and online settings of reinforcement learning from human feedback (RLHF) with KL-regularization -- a widely used objective function in large language model alignment -- under the $\epsilon$ local differential privacy ($\epsilon$-LDP) model on the label of the human preference. In the offline setting, we design an algorithm based on the principle of pessimism and derive a new suboptimality gap of $\tilde{O}(1/[(e^\epsilon-1)^2 n])$ on the KL-regularized objective under single-policy concentrability. We also prove its optimality by providing a matching lower bound where $n$ is the sample size. In the online setting, we are the first one to theoretically investigate the problem of KL-regularized RLHF with LDP. We design an optimism-based algorithm and derive a logarithmic regret bound of $O(d{\mathcal{F}}\log (N{\mathcal{F}}\cdot T) /(e^\epsilon-1)^2 )$, where $T$ is the total time step, $N{\mathcal{F}}$ is cardinality of the reward function space $\mathcal{F}$ and $d{\mathcal{F}}$ is a variant of eluder dimension for RLHF. As a by-product of our analysis, our results also imply the first analysis for online KL-regularized RLHF without privacy. We implement our algorithm in the offline setting to verify our theoretical results and release our open source code at: https://github.com/rushil-thareja/PPKL-RLHF-Official.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

强化学习 人类反馈 KL正则化 隐私保护 差分隐私
相关文章