热点
关于我们
xx
xx
"
奖励函数
" 相关文章
找到自己人生的「奖励函数」
辉哥奇谭
2025-10-27T16:26:54.000000Z
RL 是新的 Fine-Tuning
海外独角兽
2025-10-24T16:33:17.000000Z
找到自己人生的「奖励函数」
辉哥奇谭
2025-10-22T00:19:38.000000Z
找到自己人生的「奖励函数」
辉哥奇谭
2025-10-22T00:19:38.000000Z
Expressive Reward Synthesis with the Runtime Monitoring Language
cs.AI updates on arXiv.org
2025-10-21T04:21:48.000000Z
RLAF: Reinforcement Learning from Automaton Feedback
cs.AI updates on arXiv.org
2025-10-20T04:14:36.000000Z
ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning
cs.AI updates on arXiv.org
2025-10-17T04:08:17.000000Z
Repairing Reward Functions with Human Feedback to Mitigate Reward Hacking
cs.AI updates on arXiv.org
2025-10-16T04:21:06.000000Z
DeAL: Decoding-time Alignment for Large Language Models
cs.AI updates on arXiv.org
2025-10-14T04:20:46.000000Z
Fine-Tuning Diffusion Models via Intermediate Distribution Shaping
cs.AI updates on arXiv.org
2025-10-06T04:27:25.000000Z
Generalizing Behavior via Inverse Reinforcement Learning with Closed-Form Reward Centroids
cs.AI updates on arXiv.org
2025-09-16T05:44:19.000000Z
Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables
cs.AI updates on arXiv.org
2025-09-05T04:45:47.000000Z
A Mechanism for Mutual Fairness in Cooperative Games with Replicable Resources -- Extended Version
cs.AI updates on arXiv.org
2025-08-20T04:17:06.000000Z
The perils of under- vs over-sculpting AGI desires
少点错误
2025-08-05T18:20:02.000000Z
Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners
cs.AI updates on arXiv.org
2025-07-28T04:43:08.000000Z
Application of LLM Guided Reinforcement Learning in Formation Control with Collision Avoidance
cs.AI updates on arXiv.org
2025-07-23T04:03:25.000000Z
Misalignment from Treating Means as Ends
cs.AI updates on arXiv.org
2025-07-16T04:29:03.000000Z
A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving
cs.AI updates on arXiv.org
2025-07-15T04:24:18.000000Z
我们找到3位大学教授,聊了聊越来越严重的AI幻觉
36kr
2025-07-15T03:24:15.000000Z
研究人员提出因果贝尔曼方程,在特定线上学习算法中可更快得到最优智能体
MIT 科技评论 - 本周热榜
2025-07-13T16:21:35.000000Z