研究生: |
王鈞平 Wang, Chun-Ping |
---|---|
論文名稱: |
六貫棋遊戲實作與強化學習應用 Hex Game Implement and Reinforcement Learning |
指導教授: | 林順喜 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 51 |
中文關鍵詞: | 六貫棋 、強化學習 、深度學習 |
英文關鍵詞: | Hex |
DOI URL: | http://doi.org/10.6345/NTNU201900146 |
論文種類: | 學術論文 |
相關次數: | 點閱:164 下載:44 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
六貫棋,又被稱作納許棋,是一個雙人對局遊戲,最初是在1942年12月26日在丹麥報紙Politiken發表的一篇文章中出現,被稱作Polygon。於1948年時,由數學家約翰·福布斯·納許重新獨立發明出來,在最初被稱作納許棋(Nash)。後來於1952年遊戲玩具製造商Parker Brothers將其作為遊戲發行,將此遊戲命名為Hex。
在六貫棋的棋盤上由雙方輪流落子,雙方各擁有一組對邊,藉由佔領格子的方式將自己方的兩條邊連接起來以獲得勝利。在六貫棋當中已被約翰·福布斯·納許使用策略偷取的方式證明出六貫棋在先手方擁有必勝策略,而在路數小於8的盤面已經被完全破解出所有的必勝策略。
本研究試圖利用AlphaZero論文當中所提到的訓練方式,利用蒙地卡羅樹搜尋演算法搭配類神經網路訓練,嘗試藉由強化學習,從零人類知識開始只提供遊戲規則的方式,並針對3至4路的六貫棋棋盤,來訓練出能夠自我學習出完全破解3至4路的六貫棋的程式。依循此模式,在計算資源更為豐沛時,未來可以往更高路數的六貫棋實驗其破解的可能性。
Hex, also called Nash, is a game with two players. At first, it appeared and was called Polygon on a Denmark newspaper Politiken in 1942. In 1948, John Nash, who was a Mathematician, invented it and called it Nash. In 1952, Parker Brothers, which was a toy manufacturer, published it and called it Hex.
On the board of Hex, two players take turns placing a stone of their color on a single cell within the overall playing board. The goal for each player is to form a connected path of their own stones linking the opposing sides of the board marked by their colors, before their opponent connects his or her sides in a similar fashion. The first player to complete his or her connection wins the game.
Hex has been proved by John Nash by the strategy stealing argument so that the first player has a winning policy, and the boards with a size smaller than 8 have been solved by the program.
In this research, we try to use the AlphaZero training method, which uses Monte Carlo Tree Search Algorithm with Deep Learning, and try to use Reinforcement Learning to train a model without human knowledge to solve Hex with a board size of 3 and 4. According to this approach, we hope that the boards with larger sizes can also be solved using more computation resources in the future.
[1]. suragnair/alpha-zero-general, https://github.com/suragnair/alpha-zero-general。
[2]. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepe & Demis Hassabis, “Mastering the game of Go without human knowledge”,Nature volume 550, pages 354–359 (19 October 2017).
[3]. Jakub Pawlewicz, Ryan Hayward, Philip Henderson, and Broderick Arneson, “Stronger Virtual Connections in Hex”.
[4]. Broderick Arneson, Ryan B. Hayward, Philip Henderson, “Monte Carlo Tree Search in Hex”.
[5]. Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun, “Deep Residual Learning for Image Recognition”.
[6]. 徐讚昇, 許舜欽, 陳志昌, 蔣益庭, 陳柏年, 劉雲青, 張紘睿, 蔡數真, 林庭羽, 范綱宇,電腦對局概論。2017,國立臺灣大學出版中心。
[7]. 維基百科:六貫棋介紹,
https://en.wikipedia.org/wiki/ Hex_(board_game)。
[8]. TensorFlow, https://www.tensorflow.org/。
[9]. 維基百科:深藍,
https://zh.wikipedia.org/zh-tw/%E6%B7%B1%E8%97%8D_(%E8%B6%85%E7%B4%9A%E9%9B%BB%E8%85%A6)。
[10]. 林大貴,TensorFlow+Keras深度學習人工智慧實務應用,博碩出版社。
[11]. Ryan B. Hayward, Noah Weninger, “Hex 2017: MoHex wins the 11x11 and 13x13 tournaments”.
[12]. H. J. van den Herik, J.W.H.M. Uiterwijk, and J.V. Rijswijck, “Games solved: Now and in the future,” Artificial Intelligence, Vol. 134, 2002, pp. 277–311.
[13]. 小狐狸事務所-使用 Keras 卷積神經網路 (CNN) 辨識手寫數字,http://yhhuang1966.blogspot.com/2018/04/keras-cnn.html。
[14]. Hexy plays Hex,http://vanshel.com/Hexy/。
[15]. 昌爸工作坊,http://www.mathland.idv.tw/fun/nashgame.htm。
[16]. Broderick Arneson, Ryan B. Hayward, Philip Henderson, “Solving Hex: Beyond Humans”.
[17]. Jacek Ma´ndziuk, “MCTS/UCT in solving real-life problems”.
[18]. Sabyasachi Sahoo, “Residual blocks - Building blocks of ResNet”.
[19]. Kazuki Yoshizoe, Akihiro Kishimoto, Martin Muller, “Lambda Depth-first Proof Number Search and its Application to Go”.