簡易檢索 / 詳目顯示

研究生: 黃士豪
Huang, Shih-Hao
論文名稱: 利用有利條件訓練神經網路-以六子棋為例
Exploiting Favorable Conditions for Training Neural Networks-Taking Connect6 for Example
指導教授: 林順喜
Lin, Shun-Shii
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 52
中文關鍵詞: 電腦對局六子棋蒙地卡羅法神經網路深度學習
英文關鍵詞: computer games, Connect6, Monte Carlo method, neural network, deep learning
DOI URL: http://doi.org/10.6345/NTNU201900444
論文種類: 學術論文
相關次數: 點閱:285下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • DeepMind 的圍棋程式 AlphaGo 打敗韓國職業九段棋士李世乭取得了巨大的成就,隨後更提出通用演算法 AlphaZero,不僅在圍棋項目上勝過 AlphaGo,亦展示了該演算法在西洋棋及日本將棋也能訓練成功。然而,除了有效的演算法,DeepMind 也動用了龐大的運算資源,才於複雜且變化多端的圍棋項目上訓練出極致的棋力。

    本研究研發的棋類為國立交通大學資訊工程系吳毅成教授發明之六子棋
    (Connect6)。六子棋除了改善了五子棋先手有優勢的缺陷,還擁有更的複雜度,且相對於圍棋,六子棋擁有一些優勢策略。

    本實驗使用 AlphaZero 演算法,搭配一些已知的優勢策略應用於神經網路的訓練,藉以降低蒙地卡羅樹搜索空間及避免無效盤面,期望能因此取得較有效的特徵,在有限的硬體資源下,能增進訓練神經網路的效率。

    在六子棋的項目中,我們採用兩種方法來嘗試改進神經網路的訓練,第一個方法為限縮落子範圍,這個概念源自於六子棋已經被證實遠離戰場的一方必定會落敗,因此我們於訓練時將落子範圍限縮在九宮格內,藉以讓神經網路訓練時能較快學到貼身肉搏的行為。

    第二個方法為套用 domain knowledge,依照六子棋的遊戲特性,設計了必勝落子以及防禦落子,並修改蒙地卡羅樹搜尋的擴展行為,使其在判斷為必勝落子或防禦落子時能優先擴展該節點,除了減少不必要的搜索外,也能藉此讓神經網路學習到攻擊及防禦的行為。

    本研究採用的兩種方法皆獲得成功的結果。以限縮落子範圍以及套用domain knowledge的實驗數據來說,皆以較少的訓練時間及訓練量得到超越原始版本的神經網路模型,其中套用 domain knowledge所訓練神經網路模型在棋力上表現不俗,獲得了相當高的勝率。因此,我們可以推論套用有利條件訓練神經網路是相當可行的。

    DeepMind developed AlphaGo, a computer Go program, which beat South Korean professional Go player Lee Sedol. Soon afterwards, DeepMind introduced a more general algorithm, AlphaZero. It was not only better than AlphaGo in Go game, but also got great success in Chess and Shogi. However, besides an effective algorithm, DeepMind used huge computing resources for mastering the game of Go.

    This research will focus on developing program for Connect6, which was proposed by Professor I-Chen Wu. Connect6 is similar to Gomoku. But it’s more complex than Gomoku and eliminates the advantage of first player. In contrast to Go, there’re some advantageous strategies for Conncet6.

    The experiments try to apply some advantageous strategies in AlphaZero algorithm for training neural networks. These methods will reduce Monte Carlo Tree Search space and avoid invalid boards. We expect that the approach can let the neural networks get better features and improve its training efficiency with limited hardware.

    In Connect6, we try to add two methods for training neural networks. First, because Connect6 was proved that playing at breakaway moves will lose the game, thus this research limits the valid moves inside the 3×3 grid areas of the existing stones and trains a new model that will get the breakaway prevention feature.

    Second, this research applies domain knowledge. As we know, Connect6 has some advantageous moves. This research focuses on threat moves and defensive moves, and preferably expands these moves in Monte Carlo Tree Search. We reduce the search space and get the attack and defense features in the neural networks.

    According to the experiments, we find out that limiting the area of valid moves and applying domain knowledge are feasible. These methods show higher win rate with less training time and tree search time. Based on the results, we infer that the new methods are better than the original ones.

    摘要 i Abstract iii 致謝 v 目次 vi 圖次 vii 表次 viii 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 論文架構 4 第二章 文獻探討 5 2.1 AlphaGo訓練流程簡介 5 2.2 AlphaZero訓練流程簡介 7 2.3 六子棋棋規介紹 10 2.4 六子棋電腦程式沿革 11 2.5 六子棋開源碼介紹 15 第三章 研究方法 17 3.1 限縮落子範圍 17 3.2 套用domain knowledge-必勝落子 20 3.3 套用domain knowledge-防禦落子 23 3.4 套用domain knowledge-流程設計 25 第四章 實驗與結果 26 4.1 測試平台與參數設定 26 4.2 原始版本訓練結果 27 4.3 限縮落子範圍訓練結果 28 4.4 套用domain knowledge訓練結果 30 4.5 將新方法套入舊模型評比 33 4.6 各版本綜合評比 38 第五章 結論與未來工作 40 5.1 結論 40 5.2 未來工作 41 參考文獻 45 附錄A 47 附錄B 49

    一、中文文獻
    蔣秉璁,"六子棋程式強度改進之研究", 交通大學碩士論文,2015。
    楊榮貴,"六子棋之蒙地卡羅樹搜尋",東華大學博士論文,2010。
    Connect6 《六子棋魂》,https://connect6.pixnet.net/blog/。
    GitHub開源碼AlphaZero_Gomoku,https://github.com/junxiaosong/AlphaZero_Gomoku/。
    GitHub開源碼Cloudict,https://github.com/lang010/cloudict/。
    GitHub開源碼Zeta,https://github.com/GeneZC/Zeta/。
    六子棋首頁,http://www.connect6.org/。
    電腦下棋的關鍵:Min-Max對局搜尋與Alpha-Beta修剪算法,http://programmermagazine.github.io/201407/htm/focus3.html/。
    二、英文文獻
    Chih-Hung Chen, Wei-Lin Wu, Yu-Heng Chen and Shun-Shii Lin, "Some Improvements in Monte Carlo Tree Search Algorithms for Sudden Death Games", ICGA Journal, Vol. 40, No. 4, pp. 460-470, 2018.
    David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel & Demis Hassabis, “Mastering the Game of Go with Deep Neural Network and Tree Search”, Nature, Vol. 529, pp. 484-503, 2016.
    David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis, “Mastering the Game of Go without Human Knowledge”, Nature, Vol. 550, pp. 354-359, 2017.
    I-Chen Wu, Dei-Yen Huang, and Hsiu-Chen Chang, “CONNECT6”, ICGA Journal, Vol. 28, No. 4, pp. 235-242, 2005.

    下載圖示
    QR CODE