热点
关于我们
xx
xx
"
样本效率
" 相关文章
Off-policy Reinforcement Learning with Model-based Exploration Augmentation
cs.AI updates on arXiv.org
2025-10-30T04:13:22.000000Z
Grouping Nodes With Known Value Differences: A Lossless UCT-based Abstraction Algorithm
cs.AI updates on arXiv.org
2025-10-30T04:13:01.000000Z
Investigating Intra-Abstraction Policies For Non-exact Abstraction Algorithms
cs.AI updates on arXiv.org
2025-10-29T04:18:54.000000Z
Human-Like Goalkeeping in a Realistic Football Simulation: a Sample-Efficient Reinforcement Learning Approach
cs.AI updates on arXiv.org
2025-10-28T04:05:19.000000Z
MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
cs.AI updates on arXiv.org
2025-10-27T06:35:08.000000Z
Efficient semantic uncertainty quantification in language models via diversity-steered sampling
cs.AI updates on arXiv.org
2025-10-27T06:25:07.000000Z
Generalizable Hierarchical Skill Learning via Object-Centric Representation
cs.AI updates on arXiv.org
2025-10-27T06:23:49.000000Z
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
cs.AI updates on arXiv.org
2025-10-23T04:14:07.000000Z
Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation
cs.AI updates on arXiv.org
2025-10-22T04:26:50.000000Z
Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options
cs.AI updates on arXiv.org
2025-10-22T04:24:54.000000Z
Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options
cs.AI updates on arXiv.org
2025-10-22T04:24:54.000000Z
Optimizing Energy Management of Smart Grid using Reinforcement Learning aided by Surrogate models built using Physics-informed Neural Networks
cs.AI updates on arXiv.org
2025-10-21T04:28:16.000000Z
LANPO: Bootstrapping Language and Numerical Feedback for Reinforcement Learning in LLMs
cs.AI updates on arXiv.org
2025-10-21T04:25:28.000000Z
Cog-Rethinker: Hierarchical Metacognitive Reinforcement Learning for LLM Reasoning
cs.AI updates on arXiv.org
2025-10-21T04:16:02.000000Z
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents
cs.AI updates on arXiv.org
2025-10-17T04:19:15.000000Z
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents
cs.AI updates on arXiv.org
2025-10-17T04:19:15.000000Z
Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation
cs.AI updates on arXiv.org
2025-10-17T04:16:28.000000Z
Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation
cs.AI updates on arXiv.org
2025-10-17T04:16:28.000000Z
Towards Agentic Self-Learning LLMs in Search Environment
cs.AI updates on arXiv.org
2025-10-17T04:08:32.000000Z
Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents
cs.AI updates on arXiv.org
2025-10-16T04:29:13.000000Z