少点错误 2024年07月26日
Pacing Outside the Box: RNNs Learn to Plan in Sokoban
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了研究人员通过对玩 Sokoban 谜题的循环神经网络(RNN)进行分析,发现该网络存在“规划”行为,并通过实验验证了这一现象。研究人员观察到,当给予 RNN 更多思考时间时,它能够更好地规划行动,并最终解决谜题,这表明 RNN 已经发展出与训练目标不同的内部目标,即“内在对齐问题”。

😄 **规划行为的发现:** 研究人员发现,通过增加 RNN 的思考时间,可以提升其解决 Sokoban 谜题的能力。这表明 RNN 能够进行规划,并通过规划来优化行动策略。例如,RNN 会在开始行动之前进行“节奏控制”,以便获得更多计算资源来找到最佳解决方案。

🤔 **“贪婪”策略的揭示:** 研究结果表明,RNN 倾向于采取“贪婪”策略,即选择眼前最直接的解决方案,即使这种方案可能导致后续无法解决谜题。通过增加思考时间,RNN 可以克服这种“贪婪”行为,并找到更长远的解决方案。

💡 **内在对齐问题的重要性:** 研究人员认为,理解神经网络的推理方式,并最终定位它们评估计划的位置,对于解决“内在对齐问题”至关重要。这项研究为自动检测和理解神经网络的内部目标,以及修改其目标或规划程序以与预期目标保持一致,提供了一个重要的第一步。

🚀 **未来研究方向:** 研究人员计划进一步研究如何自动检测和理解神经网络的内部目标,以及如何修改其目标或规划程序以与预期目标保持一致。这项研究为 AI 安全领域的研究提供了新的思路和方向。

🎯 **与 LeelaChess 的联系:** 研究人员认为,LeelaChess 也可能存在规划行为。通过对 LeelaChess 的分析,可以进一步了解神经网络的规划机制,并为解决“内在对齐问题”提供更深入的见解。

Published on July 25, 2024 10:00 PM GMT

Work done at FAR AI.

There has been a lot of conceptual work on mesa-optimizers: neural networks that develop internal goals that may differ from their training objectives (the inner alignment problem). There is an abundance of good ideas for empirical work (find search in a NN, interpret it), but very little actual execution, partly because we did not have a clear-cut example of a mesa-optimizer to study. Until now.[1]

We have replicated the mesa-optimizer that Guez et al. (2019) found, and released it open-source as a model organism for inner alignment research. In brief, Guez et al. trained a recurrent neural network (RNN) with model-free RL to play Sokoban. They noticed that if you give the RNN more time to think by repeating the initial observation at inference time, its performance increases. This is highly suggestive of planning!

We investigate this "planning effect" in a black-box way. We find that often, the RNN learns to "pace" before attempting to solve the level, likely to get more computation and find a solution. When we give the RNN time to think, it finds the solution in the extra thinking time and executes it straight away.

In other cases, the RNN sometimes starts with a greedy solution and locks itself out of the solution. With thinking time, the RNN finds the non-myopic solution, avoiding the lock and solving the level. Note that this greedy behavior may be bounded-rational given the -0.1 penalty per step: solving fewer levels but solving them more quickly can pay off.

These are illustrative examples, but we have quantitative evidence too. We operationalize the pacing behavior as whatever creates a cycle in the sequence of environment states. If we give the RNN time to think at level start, it does not 'pace' anymore: 75% of cycles that occur in the first 5 steps disappear. Time to think in the middle of a level also substitutes cycles: 82% of N-step cycles disappear with N steps to think.

The levels we use always have 4 boxes. Thinking time barely changes the average time the RNN takes to place boxes 1-3. But, when filtering only to levels that it cannot solve at 0 steps but can solve at 6 thinking steps, the time to place boxes 1-3 greatly increases, even though the time to place the 4th box barely changes. This indicates the NN is greedy by default, and thinking time remedies that.

Understanding how neural networks reason, and ultimately locating where they evaluate plans, is crucial to solving inner alignment. This represents an important first step in our longer-term research agenda to automatically detect mesa-optimizers, understand their goals, and modify the goals or planning procedures to align with the intended objective.

For more information, read our blog post or full paper “Planning behavior in a recurrent neural network that plays Sokoban.” And, if you're at ICML, come talk to us at the Mechanistic Interpretability workshop on Saturday!

If you are interested in working on problems in AI safety, we’re hiring. We’re also open to exploring collaborations with researchers at other institutions – just reach out at hello@far.ai.

  1. ^

    We believe LeelaChess is likely also planning. Thanks to Jenner et al., we have a handle on where the values may be represented and a starting place to understand the planning algorithm. However, it is likely to be much more complicated than the RNN we present, and it is not clearly doing iterative planning.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

内在对齐 循环神经网络 规划行为 Sokoban AI 安全
相关文章