研究生: |
簡沅亨 Chien, Yuan-Heng |
---|---|
論文名稱: |
基於AlphaZero作法之國際跳棋程式開發及研究 The development and research of a Draughts program based on AlphaZero approach |
指導教授: |
林順喜
Lin, Shun-Shii |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 45 |
中文關鍵詞: | 電腦對局 、國際跳棋 、神經網路 、深度學習 、AlphaZero |
英文關鍵詞: | computer game, draughts, neural network, deep learning, AlphaZero |
DOI URL: | http://doi.org/10.6345/NTNU202000197 |
論文種類: | 學術論文 |
相關次數: | 點閱:251 下載:23 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
國際跳棋是由民族跳棋演變而來的。據說在一七二三年,居住在法國的一名波蘭軍官把六十四格的棋盤改為一百格,因此又被稱為「波蘭跳棋」。國際跳棋擁有flying king和連吃的特殊規則,使得下法有趣多變,深受大眾的喜愛。
近年來,AlphaZero演算法在多種棋類AI訓練上,都獲得極大的成功。因此,本研究使用AlphaZero的架構來實作國際跳棋的AI。然而,國際跳棋擁有連吃路徑的問題,無法以單次神經網路輸出來完整表達連吃的路徑,所以本研究設計連續走步,藉由神經網路的多次走步輸出來完整描述連吃的路徑。
為了提高國際跳棋AlphaZero的訓練效率,本研究使用大贏策略來加速訓練,讓神經網路能夠往大贏的方向去訓練。經過100迭代訓練之後,使用大贏策略訓練的神經網路模型與原始AlphaZero版本訓練的神經網路模型相比,擁有較高的勝率。
Draughts evolved from National Checkers. It is said that in 1723 a Polish military officer living in France changed the size of the board from sixty-four to a hundred. Therefore, it is also called "Polish Checkers". Draughts have special rules for flying king and continuous capturing, which makes it fun and changeful, and it is popular with the public.
In recent years, AlphaZero algorithm has achieved great success in playing various games. Hence, this research uses AlphaZero's architecture to implement Draughts AI program. However, Draughts has the problem of continuous capturing path, so it is impossible to fully express the path of continuous capturing with a single neural network output. This study designs continuous moving, and uses the output of multiple moves of the neural network to completely describe the path of continuous capturing.
In order to improve the training efficiency of the AlphaZero-based Draughts program, we apply the Big-Win strategy to speed up the training. It lets the neural network train at the direction of big wins. After 100 iterations of training, the network model trained using the Big-Win strategy has a higher winning rate than the network model trained with the original AlphaZero version.
[1] David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel & Demis Hassabis, “Mastering the Game of Go with Deep Neural Network and Tree Search”, Nature, Vol. 529, pp. 484-503, 2016.
[2] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis, “Mastering the Game of Go without Human Knowledge”, Nature, Vol. 550, pp. 354-359, 2017.
[3] Surag Nair, https://github.com/suragnair/alpha-zero-general, Stanford University.
[4] 維基百科,https://zh.wikipedia.org/wiki/西洋跳棋。
[5] 國際跳棋聯合會官方正式國際跳棋規則,http://games.sports.cn/datebase/encyclopaedia/other/2008-02-04/1386671.html.
[6] 張乃元,改進AlphaZero的大贏策略並應用於黑白棋,國立台灣師範大學資訊工程研究所碩士論文,2019。
[7] 徐讚昇、許舜欽、陳志昌、蔣益庭、陳柏年、劉雲青、張紘睿、蔡數真、林庭羽、范綱宇,電腦對局概論。2017,國立臺灣大學出版中心。
[8] 林大貴,TensorFlow+Keras深度學習人工智慧實務應用。博碩出版社。
[9] Convolutional neural network, https://en.wikipedia.org/wiki/Convolutional_neural_network.