簡易檢索 / 詳目顯示

研究生: 陳俊豪
Chern, Jiunn-Haur
論文名稱: 中國跳棋對局程式研發與深度學習之探討
The Research of Chinese Checkers Program with Deep Learning
指導教授: 林順喜
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 44
中文關鍵詞: 電腦對局中國跳棋蒙地卡羅樹搜索法AlphaZero
英文關鍵詞: computer games, Chinese Checkers, Monte Carlo Tree Search, AlphaZero
DOI URL: http://doi.org/10.6345/THE.NTNU.DCSIE.007.2019.B02
論文種類: 學術論文
相關次數: 點閱:263下載:34
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 中國跳棋遊戲是家喻戶曉的棋盤遊戲,但針對提升電腦對局棋力的研究並不多,過去以蒙地卡羅樹搜索法作為兩人中國跳棋AI的主要演算法,已經能表現出一定的棋力,但還是有改進的空間。中國跳棋的遊戲目標在於將己方的所有棋子前進至目的地,除了要使棋子能夠快速前進外,也要在適當的時機後退,取得攻防之間的平衡。
    本研究針對兩人中國跳棋遊戲的AI做改良,加入深度學習的做法,主體採用 AlphaZero 的框架來訓練類神經網路。為了在有限的硬體資源及時間下取得效果,嘗試加入針對遊戲特性的改進。先使用蒙地卡羅樹搜索法搭配隨機模擬,產生多種開局的棋譜作為預先訓練模組的訓練資料,再用此模組做後續自我對弈的學習,可避免一開始脆弱的神經網路無法結束遊戲。遊戲後期則使用單人遊戲的搜索法,以改善後期已知必勝或必敗盤面時,不會挑選最佳走步的問題。

    Chinese Checkers is a well-known board game, but it has received little research attention in efforts to improve the strength of a program. In the past, Monte Carlo Tree Search is an effective solution for two players chinese checkers, but it leaves much to be desired. The aim of the game is to be first to move all pieces into the opposite corner. In addition to making pieces forward fast, it is necessary to back off the pieces at appropriate time to achieve the balance between attack and defense.
    This thesis considers two-player case, and tries to improve Chinese Checkers program by convolution neural network. The new program is based on AlphaZero's framework to train neural network. In order to achieve results with limited hardware resources and time, we apply some improved strategies related to the game's characteristics. In order to avoid being unable to end the game, first, we create a pre-trained model gained from random playout results. Then, we train the neural network by a self-play reinforcement learning algorithm. Finally, we use the single-player MCTS at the latter stages of the game to prevent the situation that the obvious winner will not always select the best move.

    摘要 i Abstract ii 致謝 iii 目錄 iv 附表目錄 v 附圖目錄 vi 第一章 緒論 1 第一節 簡介 1 第二節 中國跳棋與棋規 2 第三節 研究目的 4 第二章 文獻探討 5 第一節 蒙地卡羅樹搜索演算法 5 第二節 MCTS 在跳棋上的運用 6 第三節 AlphaGo Zero 7 第四節 開源碼 Alpha-Zero-General 10 第三章 對局程式設計 14 第一節 使用MCTS設計跳棋AI 14 第二節 審局函數 15 第三節 Upper Confidence Bounds for Trees(UCT)公式 17 第四節 優勢平衡 17 第四章 開源碼修改 19 第一節 遊戲規則套用 19 第二節 神經網路架構 21 第三節 加入亂度 23 第四節 循環走步與預先訓練模組 24 第五節 必勝盤面走步修正 29 第五章 實驗與結果 35 第一節 實驗說明 35 第二節 實驗結果 37 第六章 結論與其他 39 第一節 結論 39 第二節 對局比賽近況 40 第三節 未來工作 42 參考文獻 43

    [1] G. Monks, Game of Skill, US Patent #383,653, 1888.
    [2] Browne, C.B., Colton, S., Cowling, P.I., Lucas, S.M., Perez, D., Powley, E., Rohlfshagen, P., Samothrakis, S., Tavener, S., Whitehouse, D. “A Survey of Monte Carlo Tree Search Methods”, IEEE Transactions on Computational Intelligence and AI in Games, 4(1), 1–43 (2012).
    [3] Bell, George I. “The Shortest Game of Chinese Checkers and Related Problems.” Integers, Volume 9 Issue 1, 17-39 (2009)
    [4] Roschke M., Sturtevant N.R. “UCT Enhancements in Chinese Checkers Using an Endgame Database”. In: Cazenave T., Winands M., Iida H. (eds) Computer Games. CGW 2013. Communications in Computer and Information Science, vol 408. Springer, Cham, 57-70 (2014)
    [5] Antonoglou, I., Baker, L., Bolton, A., Chen, Y., van den Driessche, G., Graepel, T., Guez, A., Hassabis, D., Huang, A., Hubert, T., Hui, F., Lai, M., Lillicrap, T., Schrittwieser, J., Sifre, L., Silver, D. & Simonyan, K., “Mastering the Game of Go without Human Knowledge”, Nature, 550, 354-359 (2017).
    [6] Antonoglou, I., Graepel, T., Guez, A., Hassabis, D., Hubert, T., Kumaran, D., Lai, M., Lanctot, M., Lillicrap, T., Schrittwieser, J., Sifre, L., Silver, D., Simonyan, K. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm”. (2017)
    [7] Wikipedia contributors. "Halma." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, (2018) https://en.wikipedia.org/wiki/Halma
    [8] 21屆ICGA電腦對局競賽官方網站
    https://www.tcga.tw/icga-computer-olympiad-2018/en/base.pl
    [9] Nair, S, “開源碼 Alpha-Zero-General”.
    https://github.com/suragnair/alpha-zero-general
    [10] Nair, S, “Alpha-Zero-General說明頁面”.
    http://web.stanford.edu/~surag/posts/alphazero.html
    [11] Wikipedia contributors. “Dirichlet distribution” Wikipedia, The Free Encyclopedia. (2019). https://en.wikipedia.org/wiki/Dirichlet_distribution
    [12] TAAI 電腦對局競賽官方網站 https://www.tcga.tw/taai2018/zh_TW/

    下載圖示
    QR CODE