Hive游戏中的模仿学习：状态表示与PyTorch模型设计

Recent Questions - Artificial Intelligence Stack Exchange 09月29日 12:01

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

本文探讨了在没有固定棋盘的Hive游戏中，如何为模仿学习（IL）代理构建输入和输出状态表示。鉴于Hive棋子的放置和堆叠特性，传统的2D网格表示不再适用。作者提出了以笛卡尔坐标和棋子枚举作为状态表示，并考虑了将此转化为PyTorch模型可接受的输入格式。文章讨论了参考现有研究（如AZ-Hive案例）的网格化方法，以及可能存在的“快速修复”方案，并对比了ChatGPT提出的基于字典的输入格式，旨在为如何在PyTorch中实现一个能够捕捉游戏复杂性的IL模型提供指导。

🎯 **Hive游戏状态的挑战性表示**：Hive游戏的核心挑战在于其动态且非网格化的棋盘结构，棋子通过六边形边缘相互连接，并可能堆叠。这使得传统的2D固定网格状态表示方法难以直接应用。文章提出了一种基于棋子笛卡尔坐标和其枚举类型（如'white Beetle 1'）的独特状态表示方法，以应对这一挑战，并将此视为时间序列数据进行处理。

📦 **探索网格化与字典化状态输入**：文章探讨了两种潜在的状态表示方法。一种是参考'AZ-Hive case study'论文，将笛卡尔坐标映射到28x28的NumPy数组，并将棋子枚举作为数组值。另一种是ChatGPT建议的基于固定长度字典的输入格式，包含玩家回合、棋子类型、位置和连接信息。作者对这两种方法的适用性和维度复杂性进行了权衡。

💡 **PyTorch模型输入输出设计考量**：核心问题是如何在PyTorch中为这种无固定棋盘的游戏设计输入和输出状态。作者希望将游戏状态作为一系列输入（时间序列），并寻求一种能够捕捉游戏策略和复杂性的模型。文章强调了在不完全重写现有游戏逻辑的情况下，为PyTorch模型构建有效输入和输出表示的重要性，以实现模仿学习代理。

I am currently doing my thesis project by creating an Imitation Learning (IL) agent that learns to play the board game Hive, which lacks a traditional 2D board. Pieces are placed relative to one another along their hexagonal edges, and my state space representation for the game currently is a set of Cartesian Co-ordinates for a piece in play, along with an enum representing the piece (i.e., 'white Beetle 1' etc.) - where the first piece in play starts at (0, 0). It is also worth noting that the game isn't merely 2D either - pieces may, in some cases, stack on top of one another.

How do I create an input, and output state for this model in PyTorch given there is no fixed board state in this game? I wish to input game states as an array, where each move is a new input - in essence like a timeseries.

In the paper 'AZ-Hive case study' (found here), the game is represented by a 24x24 tile grid, with pieces represented as integers, with integers being added to one another to represent a stack of pieces. Is there perhaps a quick fix here by transposing my Cartesian co-ordinates onto a Numpy array of 28x28 zeroes and the enumerations for each piece as their value?

Having asked chatGPT, it suggested an input dict of fixed length such as the following snippet as an example:

    {      "player_turn": "black",      "pieces": [        {          "type": "ant",          "position": (0, 0),          "connections": [(0, 1), (1, 0)]        },        {          "type": "grasshopper",          "position": (0, 1),          "connections": [(0, 0), (0, 2), (1, 1)]        },        {          "type": "beetle",          "position": (0, 2),          "connections": [(0, 1), (1, 2)]        }      ]    }

I am, however, unsure of this input as it relies on far too many dimensions for me to get my head around as an input.

I wish to create a PyTorch model that can capture some of the intricacies and strategies of the game, whilst not having to rewrite my current game representation entirely, as my move validator etc. is extensive. Any feedback in input and output of this model is greatly appreciated!

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签