研究生: |
陳俊豪 Chern, Jiunn-Haur |
---|---|
論文名稱: |
中國跳棋對局程式研發與深度學習之探討 The Research of Chinese Checkers Program with Deep Learning |
指導教授: | 林順喜 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 44 |
中文關鍵詞: | 電腦對局 、中國跳棋 、蒙地卡羅樹搜索法 、AlphaZero |
英文關鍵詞: | computer games, Chinese Checkers, Monte Carlo Tree Search, AlphaZero |
DOI URL: | http://doi.org/10.6345/THE.NTNU.DCSIE.007.2019.B02 |
論文種類: | 學術論文 |
相關次數: | 點閱:263 下載:34 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
中國跳棋遊戲是家喻戶曉的棋盤遊戲,但針對提升電腦對局棋力的研究並不多,過去以蒙地卡羅樹搜索法作為兩人中國跳棋AI的主要演算法,已經能表現出一定的棋力,但還是有改進的空間。中國跳棋的遊戲目標在於將己方的所有棋子前進至目的地,除了要使棋子能夠快速前進外,也要在適當的時機後退,取得攻防之間的平衡。
本研究針對兩人中國跳棋遊戲的AI做改良,加入深度學習的做法,主體採用 AlphaZero 的框架來訓練類神經網路。為了在有限的硬體資源及時間下取得效果,嘗試加入針對遊戲特性的改進。先使用蒙地卡羅樹搜索法搭配隨機模擬,產生多種開局的棋譜作為預先訓練模組的訓練資料,再用此模組做後續自我對弈的學習,可避免一開始脆弱的神經網路無法結束遊戲。遊戲後期則使用單人遊戲的搜索法,以改善後期已知必勝或必敗盤面時,不會挑選最佳走步的問題。
Chinese Checkers is a well-known board game, but it has received little research attention in efforts to improve the strength of a program. In the past, Monte Carlo Tree Search is an effective solution for two players chinese checkers, but it leaves much to be desired. The aim of the game is to be first to move all pieces into the opposite corner. In addition to making pieces forward fast, it is necessary to back off the pieces at appropriate time to achieve the balance between attack and defense.
This thesis considers two-player case, and tries to improve Chinese Checkers program by convolution neural network. The new program is based on AlphaZero's framework to train neural network. In order to achieve results with limited hardware resources and time, we apply some improved strategies related to the game's characteristics. In order to avoid being unable to end the game, first, we create a pre-trained model gained from random playout results. Then, we train the neural network by a self-play reinforcement learning algorithm. Finally, we use the single-player MCTS at the latter stages of the game to prevent the situation that the obvious winner will not always select the best move.
[1] G. Monks, Game of Skill, US Patent #383,653, 1888.
[2] Browne, C.B., Colton, S., Cowling, P.I., Lucas, S.M., Perez, D., Powley, E., Rohlfshagen, P., Samothrakis, S., Tavener, S., Whitehouse, D. “A Survey of Monte Carlo Tree Search Methods”, IEEE Transactions on Computational Intelligence and AI in Games, 4(1), 1–43 (2012).
[3] Bell, George I. “The Shortest Game of Chinese Checkers and Related Problems.” Integers, Volume 9 Issue 1, 17-39 (2009)
[4] Roschke M., Sturtevant N.R. “UCT Enhancements in Chinese Checkers Using an Endgame Database”. In: Cazenave T., Winands M., Iida H. (eds) Computer Games. CGW 2013. Communications in Computer and Information Science, vol 408. Springer, Cham, 57-70 (2014)
[5] Antonoglou, I., Baker, L., Bolton, A., Chen, Y., van den Driessche, G., Graepel, T., Guez, A., Hassabis, D., Huang, A., Hubert, T., Hui, F., Lai, M., Lillicrap, T., Schrittwieser, J., Sifre, L., Silver, D. & Simonyan, K., “Mastering the Game of Go without Human Knowledge”, Nature, 550, 354-359 (2017).
[6] Antonoglou, I., Graepel, T., Guez, A., Hassabis, D., Hubert, T., Kumaran, D., Lai, M., Lanctot, M., Lillicrap, T., Schrittwieser, J., Sifre, L., Silver, D., Simonyan, K. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm”. (2017)
[7] Wikipedia contributors. "Halma." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, (2018) https://en.wikipedia.org/wiki/Halma
[8] 21屆ICGA電腦對局競賽官方網站
https://www.tcga.tw/icga-computer-olympiad-2018/en/base.pl
[9] Nair, S, “開源碼 Alpha-Zero-General”.
https://github.com/suragnair/alpha-zero-general
[10] Nair, S, “Alpha-Zero-General說明頁面”.
http://web.stanford.edu/~surag/posts/alphazero.html
[11] Wikipedia contributors. “Dirichlet distribution” Wikipedia, The Free Encyclopedia. (2019). https://en.wikipedia.org/wiki/Dirichlet_distribution
[12] TAAI 電腦對局競賽官方網站 https://www.tcga.tw/taai2018/zh_TW/