cs.AI updates on arXiv.org 10月30日 12:16
LRT-Diffusion:风险感知的离线强化学习扩散策略
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出LRT-Diffusion,一种风险感知的采样规则,用于离线强化学习。该方法将去噪步骤视为无条件先验与状态条件策略头之间的序列假设检验,通过逻辑控制器对条件均值进行门控,实现证据驱动的调整,并引入风险预算。实验表明,LRT-Diffusion在D4RL MuJoCo任务中优于Q引导的基线,同时满足期望的alpha值。

arXiv:2510.24983v1 Announce Type: cross Abstract: Diffusion policies are competitive for offline reinforcement learning (RL) but are typically guided at sampling time by heuristics that lack a statistical notion of risk. We introduce LRT-Diffusion, a risk-aware sampling rule that treats each denoising step as a sequential hypothesis test between the unconditional prior and the state-conditional policy head. Concretely, we accumulate a log-likelihood ratio and gate the conditional mean with a logistic controller whose threshold tau is calibrated once under H0 to meet a user-specified Type-I level alpha. This turns guidance from a fixed push into an evidence-driven adjustment with a user-interpretable risk budget. Importantly, we deliberately leave training vanilla (two heads with standard epsilon-prediction) under the structure of DDPM. LRT guidance composes naturally with Q-gradients: critic-gradient updates can be taken at the unconditional mean, at the LRT-gated mean, or a blend, exposing a continuum from exploitation to conservatism. We standardize states and actions consistently at train and test time and report a state-conditional out-of-distribution (OOD) metric alongside return. On D4RL MuJoCo tasks, LRT-Diffusion improves the return-OOD trade-off over strong Q-guided baselines in our implementation while honoring the desired alpha. Theoretically, we establish level-alpha calibration, concise stability bounds, and a return comparison showing when LRT surpasses Q-guidance-especially when off-support errors dominate. Overall, LRT-Diffusion is a drop-in, inference-time method that adds principled, calibrated risk control to diffusion policies for offline RL.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

离线强化学习 扩散策略 风险感知 LRT-Diffusion D4RL MuJoCo
相关文章