簡易檢索 / 詳目顯示

研究生: 陳毅泰
Chen, Yi-Tai
論文名稱: 基於強化學習之Surakarta棋程式開發與研究
Research and Development of Surakarta Program with Reinforcement Learning
指導教授: 林順喜
Lin, Shun-Shii
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 49
中文關鍵詞: 電腦對局Surakarta棋AlphaZero神經網路深度學習
英文關鍵詞: computer games, Surakarta, AlphaZero, neural network, deep learning
DOI URL: http://doi.org/10.6345/NTNU201900787
論文種類: 學術論文
相關次數: 點閱:298下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Surakarta棋是起源於印尼爪哇島的一種雙人零和遊戲,原名Permainan,在印尼文是遊戲之意,後來由法國人命名為Surakarta,取自當地地名「梭羅」。遊戲中獨一無二的吃子方法是這種棋的最大亮點,透過棋盤外圍的環狀構造,將對手的棋子一網打盡後,方可獲得最後的勝利。
    除了現實的遊戲外,Surakarta棋也是Computer Olympiad定期舉辦的比賽項目之一,歷年來誕生了不少棋力高強的程式。而這兩年的AlphaGo和AlphaZero將電腦對局推向了新的里程碑,也有了新的契機,希望能夠將Surakarta棋程式的棋力向上提升。
    本研究將利用AlphaZero的架構,搭配不同的參數及架構上的改良,訓練及實做Surakarta棋的AI和視覺化平台。除了單一神經網路的版本,研究中也嘗試了一種新的多神經網路架構,將遊戲的過程分成三階段並訓練三種不同的神經網路來各司其職,分別為「開局網路」、「中局網路」和「殘局網路」。其中,使用殘局網路版本的AlphaZero算法和DTC殘局庫做了交叉驗證,顯示其正確率高達99%。

    Surakarta is an Indonesian zero-sum board game for two players. The original name of the game is Permainan, which means "the game" in Bahasa Indonesia. It was named after the ancient city of Surakarta in central Java. The unique method of capturing pieces in the game is the biggest highlight of this kind of thing. Through the inner or outer circuits around the board, a player needs to capture all the opponent's pieces to get the final victory.
    In addition to the human-playing purpose, Surakarta is also one of the regular events organized by the Computer Olympiad. Over the years, many strong programs have been developed and conducted. In the past two years, AlphaGo and AlphaZero have pushed the computer games to a new milestone, and there is a new opportunity to promote the level of the Surakarta program.
    This study will use AlphaZero architecture, with different parameters and architectural improvements, to train the AI engine. We also implement the visualization platform of Surakarta. In addition to the original single-neural network version, the research also tries to use a new multi-neural network architecture, which divides the game process into three phases and trains three different neural networks to perform their respective functions, namely "Opening Network", "Middle Network", and "Ending Network". Among them, the cross-validation is performed using the AlphaZero algorithm on the Ending Network version and the DTC endgame tablebase. It shows that the correct rate of the former one is as high as 99% compared to the DTC endgame tablebase.

    摘要 i Abstract ii 致謝 iii 目錄 iv 圖目錄 vi 表目錄 vii 第一章 緒論 1 1.1 研究背景 1 1.2 研究目的 5 第二章 文獻探討 7 2.1 AlphaZero 7 2.2 Alpha-Zero-General 9 2.3 Endgame Tablebase 11 2.4 另類的強化學習 12 第三章 程式實作 14 3.1 資料結構 14 3.2 吃子運算 17 3.3 蒙地卡羅樹的分支循環問題 20 3.4 圖形化介面 22 第四章 改良與構想 24 4.1 殘局網路的訓練與驗證 24 4.2 多神經網路架構 26 第五章 實驗與結果 28 5.1 實驗環境與參數設定 28 5.2 單一神經網路架構 30 5.3 4子殘局網路 34 5.4 5子殘局網路 37 5.5 多神經網路架構 39 第六章 結論與未來工作 42 6.1 結論 42 6.2 未來工作 43 參考文獻 44 附錄 46 附錄1 單一神經網路架構實驗數據 46

    [1] Game complexity, https://en.wikipedia.org/wiki/Game_complexity。
    [2] Mark H.M. Winands, The Surakarta Bot Revealed, 2015, Games and AI Group, Department of Data Science and Knowledge Engineering, Maastricht University, Maastricht, The Netherlands.
    [3] David Silver*, Julian Schrittwieser*, Karen Simonyan*, Ioannis Antonoglou, Aja Huang, Arthur, Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy, Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis, Mastering the Game of Go without Human Knowledge, Nature, 2017, DeepMind.
    [4] Surag Nair, https://github.com/suragnair/alpha-zero-general, Stanford University.
    [5] Surag Nair, Shantanu Thakoor, Megha Jhunjhunwala, Learning to Play Othello Without Human Knowledge, 2017, Stanford University.
    [6] 詹傑淳,電腦圍棋打劫的最佳策略之研究 A Study of Optimal Strategies for Ko Fight of Computer Go, 2010。
    [7] Yen-Chi Chen, Jia-Fong Yeh and Shun-Shii Lin, “Design and Implementation Aspects of a Surakarta Program,” ICGA Journal, vol.40, no.4, pp.438-449, March 25, 2019.
    [8] Chih-Yu Kao, Kuei-Ting Kuo, “應用N-Tuple網路之Surakarta棋人工智慧”, TCGA 2019.
    [9] Lessons from Implementing AlphaZero, Aditya Prasad, https://medium.com/oracledevs/lessons-from-implementing-alphazero-7e36e9054191.
    [10] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis, Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Nature, 2017, DeepMind.
    [11] David Silver, UCL Course on RL, http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html.
    [12] D.P. Kingma and J. Ba. Adam: A method for stochastic optimization. The International Conference on Learning Representations (ICLR), 2015.

    下載圖示
    QR CODE