簡易檢索 / 詳目顯示

研究生: 偕為昭
Jie, Wei-Zhao
論文名稱: 強化學習與遷移學習應用於六貫棋遊戲
Investigating Reinforcement Learning and Transfer Learning in Hex Game
指導教授: 林順喜
Lin, Shun-Shii
口試委員: 吳毅成
Wu, I-Chen
顏士淨
Yen, Shi-Jim
陳志昌
Chen, Jr-Chang
周信宏
Chou, Hsin-Hung
林順喜
Lin, Shun-Shii
口試日期: 2023/06/28
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 46
中文關鍵詞: 六貫棋強化學習遷移學習
英文關鍵詞: Hex, Reinforcement Learning, Transfer Learning
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202300891
論文種類: 學術論文
相關次數: 點閱:70下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 六貫棋是一款雙人對局遊戲,起初在1942年於丹麥的報紙中出現,被稱為Polygon。1948年時,被美國數學家John Forbes Nash Jr.重新獨立發明,並稱為Nash。最後在1952年由製造商Parker Brothers發行,且將其命名為Hex。在此遊戲中,上下及左右的對邊各以一個顏色表示,雙方玩家需要在棋盤上落子並將自己顏色的對邊連接以取得勝利。此遊戲為零和遊戲,且不會有平手的情況發生。在以前的研究中,六貫棋在9路以下的盤面已經被破解。
    由於AlphaZero的問世,現今電腦對局遊戲的程式有更進一步的發展,以該方法研發的對局程式都有不錯的棋力。而在六貫棋遊戲中,不得不提由加拿大Alberta大學研發的Mohex程式,該程式一直都在競賽中得到優異的成績,至今也持續進行改良。
    本研究試圖以AlphaZero的訓練框架進行強化學習,並以Mohex破解的盤面資料為輔助。在訓練大盤面的模型時需要較多的成本,因此嘗試結合遷移學習的方式,運用已經破解的小盤面資料,使初期的自我對下階段就能產生較好的棋譜,而不是從完全的零知識開始訓練,藉此提升大盤面模型的訓練成果。並且比較在進行遷移學習時,使用不同參數轉移方法的影響。

    Hex is a two-player board game that first appeared in a Denmark newspaper in 1942 and was called Polygon. In 1948, American mathematician John Forbes Nash Jr. reinvented the game independently and called it Nash. Finally, in 1952, it was published by the manufacturer Parker Brothers and renamed Hex. In the game board, each of the opposite sides (vertically and horizontally) is represented by a different color. Players take turns placing their pieces on the board to connect opposite sides that marked by their colors to win. This game is a zero-sum game, and a tie is impossible. In previous research, the game has been solved for board sizes smaller than 9×9.
    With the advent of AlphaZero, programs for board games have been further investigation, and programs developed using this method have also shown good performance. In the game of Hex, the program “Mohex” developed by the University of Alberta is noteworthy. It already had excellent results in competitions and is continuously improving its strength.
    This thesis attempts to use the framework of AlphaZero for reinforcement learning and uses the solved board data from Mohex for assistance. Since training a model for larger board sizes require more resources, so we aim to combine transfer learning with solved games for smaller board sizes to get better gameplay in the early stages of self-play, rather than starting from zero knowledge. By the above approach, we try to improve the training results of the model for larger board sizes. Additionally, we compare the effects of using different ways to transfer parameters during transfer learning.

    第一章 緒論 1 1.1 研究背景 1 1.2 研究目的 3 第二章 文獻探討 4 2.1 六貫棋遊戲策略 4 2.2 AlphaZero 5 2.3 遷移學習 7 2.4 Mohex 9 2.5 卷積神經網路 11 2.5.1 卷積層 11 2.5.2 池化層 12 2.5.3 全連接層 13 2.6 alpha-zero-general開源碼 15 第三章 方法與步驟 16 3.1 將六貫棋實作於alpha-zero-general 16 3.1.1 盤面設計與勝負判斷 16 3.1.2 對稱盤面 17 3.2 神經網路架構 18 3.3 藉由Mohex 產生訓練資料 19 3.3.1 將最佳走步轉換為訓練資料 21 3.3.2 沒有必勝走步的情況 22 3.3.3 將盤面進行翻轉得到更多訓練資料 23 3.4 原版AlphaZero的訓練 24 3.5 模型的預訓練及Layer transfer 25 3.6 Layer transfer時參數的對應與處理 26 3.6.1 將參數量不同的網路層直接進行初始化 27 3.6.2 將參數對應至相似的位置 27 3.7 將完成參數轉移的模型放入alpha-zero-general 32 第四章 實驗結果 33 4.1 實驗環境 33 4.2 將最佳解資訊轉為訓練資料的方法驗證 34 4.3 使用預訓練參數模型進行AlphaZero框架訓練 36 4.3.1 不使用所有預訓練參數版本與原版之比較 36 4.3.2 使用所有預訓練參數版本與原版之比較 38 4.3.3 方法一和方法二之比較 39 4.4 參數轉移時使用不同對應方式 42 4.5 與Mohex進行對戰 43 第五章 結論與未來方向 44 參考文獻 45

    [1] DeepMind, https://www.deepmind.com/.
    [2] Wikipedia: Hex, https://en.wikipedia.org/wiki/Hex_(board_game).
    [3] Jakub Pawlewicz, Ryan Hayward, Philip Henderson, Broderick Arneson, “Stronger Virtual Connections in Hex”, IEEE Trans. on Computational Intelligence and AI in Games, vol. 7, no. 2, June 2015, pp. 156-166.
    [4] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis, “Mastering the Game of Go without Human Knowledge”, Nature, vol. 550, Oct. 2017, pp. 354-359.
    [5] Lisa Torrey, Jude Shavlik, “Transfer learning”, in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. Hershey, PA: IGI global, 2010, pp. 242-264.
    [6] cgao3/benzene-vanilla-cmake, https://github.com/cgao3/benzene-vanilla-cmake.
    [7] Broderick Arneson, Ryan B. Hayward, Philip Henderson, “Monte Carlo Tree Search in Hex”, IEEE Trans. on Computational Intelligence and AI in Games (special issue: Monte Carlo Techniques and Computer Go), vol. 2, no. 4, Dec. 2010, pp. 251-257.
    [8] Broderick Arneson, Ryan B. Hayward, Philip Henderson, “Solving Hex: Beyond Humans”, Computers and Games, CG 2010, Lecture Notes in Computer Science, vol. 6515, Springer Berlin/Heidelberg, 2011, pp. 1-10. https://doi.org/10.1007/978-3-642-17928-0_1.
    [9] Shih-Chieh Huang, Broderick Arneson, Ryan B. Hayward, Martin Müller, Jakub Pawlewicz, “MOHEX 2.0: A Pattern-Based MCTS Hex Player”, In: van den Herik, H., Iida, H., Plaat, A. (eds) Computers and Games. CG 2013. Lecture Notes in Computer Science, vol. 8427. Springer, Cham. https://doi.org/10.1007/978-3-319-09165-5_6.
    [10] Ryan Hayward, Noah Weninger, “Hex 2017: MoHex Wins the 11x11 and 13x13 Tournaments”, ICGA Journal, vol. 39, no. 3-4, Jan. 2017, pp. 222-227.
    [11] Yann LeCun, Leon Bottou, Yoshua Bengio,Patrick Haffner, “Gradient-Based Learning Applied to Document Recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, https://doi.org/10.1109/5.726791.
    [12] suragnair/alpha-zero-general, https://github.com/suragnair/alpha-zero-general.
    [13] Shantanu Thakoor, Surag Nair, Megha Jhunjhunwala, “Learning to Play Othello without Human Knowledge,” Stanford University CS238 Final Project Report, 2017.
    [14] PyTorch, https://pytorch.org/.
    [15] 王鈞平,六貫棋遊戲實作與強化學習應用,國立臺灣師範大學資訊工程所碩士論文,2019。
    [16] Dennis J.N.J. Soemers, Vegard Mella, Eric Piette, Matthew Stephenson, Cameron Browne, Olivier Teytaud, “Transfer of Fully Convolutional Policy-Value Networks between Games and Game Variants,” arXiv preprint, https://arxiv.org/abs/2102.12375, 2021.

    下載圖示
    QR CODE