样本效率_Fishai

热点

"样本效率" 相关文章

Off-policy Reinforcement Learning with Model-based Exploration Augmentation

cs.AI updates on arXiv.org 2025-10-30T04:13:22.000000Z

Grouping Nodes With Known Value Differences: A Lossless UCT-based Abstraction Algorithm

cs.AI updates on arXiv.org 2025-10-30T04:13:01.000000Z

Investigating Intra-Abstraction Policies For Non-exact Abstraction Algorithms

cs.AI updates on arXiv.org 2025-10-29T04:18:54.000000Z

Human-Like Goalkeeping in a Realistic Football Simulation: a Sample-Efficient Reinforcement Learning Approach

cs.AI updates on arXiv.org 2025-10-28T04:05:19.000000Z

MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents

cs.AI updates on arXiv.org 2025-10-27T06:35:08.000000Z

Efficient semantic uncertainty quantification in language models via diversity-steered sampling

cs.AI updates on arXiv.org 2025-10-27T06:25:07.000000Z

Generalizable Hierarchical Skill Learning via Object-Centric Representation

cs.AI updates on arXiv.org 2025-10-27T06:23:49.000000Z

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping

cs.AI updates on arXiv.org 2025-10-23T04:14:07.000000Z

Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation

cs.AI updates on arXiv.org 2025-10-22T04:26:50.000000Z

Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options

cs.AI updates on arXiv.org 2025-10-22T04:24:54.000000Z

Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options

cs.AI updates on arXiv.org 2025-10-22T04:24:54.000000Z

Optimizing Energy Management of Smart Grid using Reinforcement Learning aided by Surrogate models built using Physics-informed Neural Networks

cs.AI updates on arXiv.org 2025-10-21T04:28:16.000000Z

LANPO: Bootstrapping Language and Numerical Feedback for Reinforcement Learning in LLMs

cs.AI updates on arXiv.org 2025-10-21T04:25:28.000000Z

Cog-Rethinker: Hierarchical Metacognitive Reinforcement Learning for LLM Reasoning

cs.AI updates on arXiv.org 2025-10-21T04:16:02.000000Z

Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents

cs.AI updates on arXiv.org 2025-10-17T04:19:15.000000Z

Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents

cs.AI updates on arXiv.org 2025-10-17T04:19:15.000000Z

Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation

cs.AI updates on arXiv.org 2025-10-17T04:16:28.000000Z

Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation

cs.AI updates on arXiv.org 2025-10-17T04:16:28.000000Z

Towards Agentic Self-Learning LLMs in Search Environment

cs.AI updates on arXiv.org 2025-10-17T04:08:32.000000Z

Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents

cs.AI updates on arXiv.org 2025-10-16T04:29:13.000000Z

Copyright © 2019 FISHAI.All Rights Reserved