P2DT：缓解大型模型智能体灾难性遗忘的新方法

cs.AI updates on arXiv.org 09月04日

P2DT：缓解大型模型智能体灾难性遗忘的新方法

本文提出了一种名为P2DT的全新方法，通过在训练新任务时动态添加决策标记，增强transformer模型，以促进特定任务的策略形成。P2DT在持续和离线强化学习场景中减轻了遗忘问题，并有效保留了先前研究中的知识，减轻灾难性遗忘，且随着任务环境规模的增加而具有良好的扩展性。

arXiv:2401.11666v2 Announce Type: replace-cross Abstract: Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model, causing performance degradation when these agents face new tasks. In our work, we propose a novel solution - the Progressive Prompt Decision Transformer (P2DT). This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus fostering task-specific policies. Our approach mitigates forgetting in continual and offline reinforcement learning scenarios. Moreover, P2DT leverages trajectories collected via traditional reinforcement learning from all tasks and generates new task-specific tokens during training, thereby retaining knowledge from previous studies. Preliminary results demonstrate that our model effectively alleviates catastrophic forgetting and scales well with increasing task environments.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

P2DT 灾难性遗忘 transformer模型强化学习知识保留

相关文章

Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680

AI Trends 2024: Reinforcement Learning in the Age of LLMs with Kamyar Azizzadenesheli - #670

What’s Next in LLM Reasoning? with Roland Memisevic - #646

AI Trends 2023: Reinforcement Learning - RLHF, Robotic Pre-Training, and Offline RL with Sergey Levine - #612

Reinforcement Learning for Personalization at Spotify with Tony Jebara - #609

Deep Learning, Transformers, and the Consequences of Scale with Oriol Vinyals - #546

The Benefit of Bottlenecks in Evolving Artificial Intelligence with David Ha - #535

Advancing Deep Reinforcement Learning with NetHack, w/ Tim Rocktäschel - #527

Applying RL to Real-World Robotics with Abhishek Gupta - #466

Off-Line, Off-Policy RL for Real-World Decision Making at Facebook - #448