cs.AI updates on arXiv.org 09月04日
P2DT:缓解大型模型智能体灾难性遗忘的新方法
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出了一种名为P2DT的全新方法,通过在训练新任务时动态添加决策标记,增强transformer模型,以促进特定任务的策略形成。P2DT在持续和离线强化学习场景中减轻了遗忘问题,并有效保留了先前研究中的知识,减轻灾难性遗忘,且随着任务环境规模的增加而具有良好的扩展性。

arXiv:2401.11666v2 Announce Type: replace-cross Abstract: Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model, causing performance degradation when these agents face new tasks. In our work, we propose a novel solution - the Progressive Prompt Decision Transformer (P2DT). This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus fostering task-specific policies. Our approach mitigates forgetting in continual and offline reinforcement learning scenarios. Moreover, P2DT leverages trajectories collected via traditional reinforcement learning from all tasks and generates new task-specific tokens during training, thereby retaining knowledge from previous studies. Preliminary results demonstrate that our model effectively alleviates catastrophic forgetting and scales well with increasing task environments.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

P2DT 灾难性遗忘 transformer模型 强化学习 知识保留
相关文章