奖励函数_Fishai

热点

"奖励函数" 相关文章

找到自己人生的「奖励函数」

辉哥奇谭 2025-10-27T16:26:54.000000Z

RL 是新的 Fine-Tuning

海外独角兽 2025-10-24T16:33:17.000000Z

找到自己人生的「奖励函数」

辉哥奇谭 2025-10-22T00:19:38.000000Z

找到自己人生的「奖励函数」

辉哥奇谭 2025-10-22T00:19:38.000000Z

Expressive Reward Synthesis with the Runtime Monitoring Language

cs.AI updates on arXiv.org 2025-10-21T04:21:48.000000Z

RLAF: Reinforcement Learning from Automaton Feedback

cs.AI updates on arXiv.org 2025-10-20T04:14:36.000000Z

ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning

cs.AI updates on arXiv.org 2025-10-17T04:08:17.000000Z

Repairing Reward Functions with Human Feedback to Mitigate Reward Hacking

cs.AI updates on arXiv.org 2025-10-16T04:21:06.000000Z

DeAL: Decoding-time Alignment for Large Language Models

cs.AI updates on arXiv.org 2025-10-14T04:20:46.000000Z

Fine-Tuning Diffusion Models via Intermediate Distribution Shaping

cs.AI updates on arXiv.org 2025-10-06T04:27:25.000000Z

Generalizing Behavior via Inverse Reinforcement Learning with Closed-Form Reward Centroids

cs.AI updates on arXiv.org 2025-09-16T05:44:19.000000Z

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables

cs.AI updates on arXiv.org 2025-09-05T04:45:47.000000Z

A Mechanism for Mutual Fairness in Cooperative Games with Replicable Resources -- Extended Version

cs.AI updates on arXiv.org 2025-08-20T04:17:06.000000Z

The perils of under- vs over-sculpting AGI desires

少点错误 2025-08-05T18:20:02.000000Z

Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners

cs.AI updates on arXiv.org 2025-07-28T04:43:08.000000Z

Application of LLM Guided Reinforcement Learning in Formation Control with Collision Avoidance

cs.AI updates on arXiv.org 2025-07-23T04:03:25.000000Z

Misalignment from Treating Means as Ends

cs.AI updates on arXiv.org 2025-07-16T04:29:03.000000Z

A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving

cs.AI updates on arXiv.org 2025-07-15T04:24:18.000000Z

我们找到3位大学教授，聊了聊越来越严重的AI幻觉

36kr 2025-07-15T03:24:15.000000Z

研究人员提出因果贝尔曼方程，在特定线上学习算法中可更快得到最优智能体

MIT 科技评论 - 本周热榜 2025-07-13T16:21:35.000000Z

Copyright © 2019 FISHAI.All Rights Reserved