少点错误 09月16日
神经网络规划:一个简单的寻路算法
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一个关于神经网络规划的研究,特别是训练了一个循环卷积神经网络(R-CNN)来解决迷宫问题。研究者将该网络应用于一个33x33的迷宫,并深入分析了其在不同迭代次数下的“思考”过程。通过可视化输出,研究者推断该网络采用了一种名为“死胡同填充”的算法来寻找源点和目标点之间的路径。这种算法简单高效,并且能够优化在任意迭代次数下找到可行解,但其应用范围有限,不适用于研究更复杂的“元优化器”。

🧠 **神经网络规划能力探索**:研究通过训练一个循环卷积神经网络(R-CNN)来解决迷宫问题,旨在探索神经网络在规划方面的能力,并将其作为研究“元优化器”的起点。该网络被训练来解决9x9迷宫,并被应用于一个更大的33x33迷宫。

💡 **“死胡同填充”算法推断**:通过可视化R-CNN在不同迭代次数下的输出,研究者推断该网络实际执行的是一种名为“死胡同填充”的算法。该算法通过标记死胡同为负面区域,并将源点/目标点标记为正面区域,然后进行类似“洪水填充”的扩散,直到找到路径。这种方法利用了迷宫作为树结构的特性,确保了算法的有效性。

⚙️ **算法特性与局限性**:该“死胡同填充”算法被认为是简单、快速且能够优化在任意迭代次数下找到可行解的。然而,其能够优化的目标范围非常有限,仅限于寻找两个点之间的连接路径,因此对于研究更强大、更通用的“元优化器”来说,其适用性受到限制。

Published on September 15, 2025 8:49 PM GMT

Work done as part of my work with FAR AI, back in February 2023. It's a small result but I want to get it out of my drafts folder. It was the start of the research that led to interpreting the Sokoban planning RNN.

I was trying to study neural networks that plan, in order to have examples of mesa-optimizers.

I trained the recurrent maze CNN from Bansal et al. 2022 to solve 9x9 mazes, and applied it to a 33x33 maze. The architecture in their paper is a recurrent convolutional NN (R-CNN) that is regularized to be able to stop its computation at any iteration: during training, the NN runs for a random number of iterations before it is scored.

The task is supervised learning with inputs and labels like the following. The loss is cross-entropy.

Left: the test maze throughout this post, a 72x72 RGB image. Right: the label corresponding to this maze: the path between source and goal (including them) is painted white, and everything else is black.

Unrolling the R-CNN’s thinking

I set out to interpret the R-CNN. I plotted the output of the CNN as it evolves at each step. 

40 iterations of the RNN’s thinking, unrolled across time from top-left to bottom-right. Yellow/bright is positive and blue/dark is negative. This is the subtraction of logits[1] - logits[0] to give a “probability of white”.

First of all note that simply connected mazes are a tree, so there is only one path between any two locations. Thus, the task is not to find the shortest path between source and goal, but any path at all! This is a very simple task.

Second, the RNN is encouraged to output workable solutions at any amount of iterations. Thus, it’s going to find an algorithm that takes as few iterations as possible.

Roughly the algorithm

This is based on the evidence in the picture above. Here’s the algorithm that I think this R-CNN implements:

Dead-end filling

After interpreting this NN I read the Wikipedia page on maze-solving algorithms, and found that the algorithm above is known as dead-end filling. See this video on it.

Implications for alignment

The algorithm is pretty clever, it’s simple and quick. It can optimize for connecting any two goals, and is thus technically a learned optimizer. However, the range of goals that it is able to optimize is very limited, and as such it is not very useful for studying more capable mesa-optimizers.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

神经网络 规划 AI 死胡同填充 元优化器 Neural Networks Planning AI Dead-End Filling Mesa-Optimizers
相关文章