数据蒸馏优化离线强化学习

cs.AI updates on arXiv.org 9小时前

数据蒸馏优化离线强化学习

本文提出利用数据蒸馏方法训练和蒸馏优质数据集，用于离线强化学习，实现模型在少量数据上达到与完整数据集或行为克隆方法相当的性能。

arXiv:2407.20299v3 Announce Type: replace-cross Abstract: Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train a policy to perform well in the actual environment given the offline data. We propose using data distillation to train and distill a better dataset which can then be used for training a better policy model. We show that our method is able to synthesize a dataset where a model trained on it achieves similar performance to a model trained on the full dataset or a model trained using percentile behavioral cloning. Our project site is available at https://datasetdistillation4rl.github.io . We also provide our implementation at https://github.com/ggflow123/DDRL .

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

数据蒸馏离线强化学习模型训练性能优化

相关文章

留给“端侧大模型”的时间不多了

Sparse Maximal Update Parameterization (SμPar): Optimizing Sparse Neural Networks for Superior Training Dynamics and Efficiency

Node.js 最佳实践：开发人员指南

Webassembly：网络应用程序的近原生性能

This AI Paper from Databricks and MIT Propose Perplexity-Based Data Pruning: Improving 3B Parameter Model Performance and Enhancing Language Models

是时候向谷歌字体说再见了：缓存性能 (2020)

用于连接处理的简单、高效和稳健的哈希表

利用 Zig 的分配器

This AI Research Discusses Achieving Efficient Large Language Models (LLMs) by Eliminating Matrix Multiplication for Scalable Performance

Rails 上的异步 Ruby