國立臺灣師範大學博碩士論文全文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	簡沅亨 Chien, Yuan-Heng
論文名稱：	基於AlphaZero作法之國際跳棋程式開發及研究 The development and research of a Draughts program based on AlphaZero approach
指導教授：	林順喜 Lin, Shun-Shii
學位類別：	碩士 Master
系所名稱：	資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	45
中文關鍵詞：	電腦對局、國際跳棋、神經網路、深度學習、AlphaZero
英文關鍵詞：	computer game, draughts, neural network, deep learning, AlphaZero
DOI URL：	http://doi.org/10.6345/NTNU202000197
論文種類：	學術論文
相關次數：	點閱：779 下載：25
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

國際跳棋是由民族跳棋演變而來的。據說在一七二三年，居住在法國的一名波蘭軍官把六十四格的棋盤改為一百格，因此又被稱為「波蘭跳棋」。國際跳棋擁有flying king和連吃的特殊規則，使得下法有趣多變，深受大眾的喜愛。
近年來，AlphaZero演算法在多種棋類AI訓練上，都獲得極大的成功。因此，本研究使用AlphaZero的架構來實作國際跳棋的AI。然而，國際跳棋擁有連吃路徑的問題，無法以單次神經網路輸出來完整表達連吃的路徑，所以本研究設計連續走步，藉由神經網路的多次走步輸出來完整描述連吃的路徑。
為了提高國際跳棋AlphaZero的訓練效率，本研究使用大贏策略來加速訓練，讓神經網路能夠往大贏的方向去訓練。經過100迭代訓練之後，使用大贏策略訓練的神經網路模型與原始AlphaZero版本訓練的神經網路模型相比，擁有較高的勝率。

Draughts evolved from National Checkers. It is said that in 1723 a Polish military officer living in France changed the size of the board from sixty-four to a hundred. Therefore, it is also called "Polish Checkers". Draughts have special rules for flying king and continuous capturing, which makes it fun and changeful, and it is popular with the public.
In recent years, AlphaZero algorithm has achieved great success in playing various games. Hence, this research uses AlphaZero's architecture to implement Draughts AI program. However, Draughts has the problem of continuous capturing path, so it is impossible to fully express the path of continuous capturing with a single neural network output. This study designs continuous moving, and uses the output of multiple moves of the neural network to completely describe the path of continuous capturing.
In order to improve the training efficiency of the AlphaZero-based Draughts program, we apply the Big-Win strategy to speed up the training. It lets the neural network train at the direction of big wins. After 100 iterations of training, the network model trained using the Big-Win strategy has a higher winning rate than the network model trained with the original AlphaZero version.

第一章 緒論 1
1 研究背景 1
2 研究目的 3

第二章 文獻探討 4
1 AlphaGo 4
2 AlphaGo Zero 6
3 Alpha-Zero-General 7
4 國際跳棋規則 8

第三章 程式實作 11
1 國際跳棋盤面表示 11
1.1 盤面表示架構 11
1.2 兵盤面和王盤面 12
1.3 連續走步位置盤面和王連續走步型態盤面 13
1.4 已吃棋位置盤面 13
2 走步設計 14
2.1 走步構思面臨的難題 14
2.2 國際跳棋的走步設計 16
2.3 多條路徑走步問題 21
2.4 一般走步和連續走步 22
2.5 使用連續走步位置盤面和王連續走步型態盤面原因 29

第四章 實驗改良與構想 31
1 AlphaZero訓練效率問題及構想 31
2 大贏策略 32
3 大贏策略應用於國際跳棋 32
4 國際跳棋中期大贏策略 34
4.1 中期大贏策略實驗構想 34
4.2 國際跳棋中期大贏策略架構設計 34

第五章 實驗結果 36
1 實驗環境與參數設定 36
2 原版AlphaZero 國際跳棋訓練 38
3 國際跳棋使用大贏策略訓練 40
4 國際跳棋使用中期大贏策略訓練 42

第六章 結論與未來工作 44

參考文獻 45
                                

[1] David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel & Demis Hassabis, “Mastering the Game of Go with Deep Neural Network and Tree Search”, Nature, Vol. 529, pp. 484-503, 2016.

[2] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis, “Mastering the Game of Go without Human Knowledge”, Nature, Vol. 550, pp. 354-359, 2017.

[3] Surag Nair, https://github.com/suragnair/alpha-zero-general, Stanford University.

[4] 維基百科，https://zh.wikipedia.org/wiki/西洋跳棋。

[5] 國際跳棋聯合會官方正式國際跳棋規則，http://games.sports.cn/datebase/encyclopaedia/other/2008-02-04/1386671.html.

[6] 張乃元，改進AlphaZero的大贏策略並應用於黑白棋，國立台灣師範大學資訊工程研究所碩士論文，2019。

[7] 徐讚昇、許舜欽、陳志昌、蔣益庭、陳柏年、劉雲青、張紘睿、蔡數真、林庭羽、范綱宇，電腦對局概論。2017，國立臺灣大學出版中心。

[8] 林大貴，TensorFlow+Keras深度學習人工智慧實務應用。博碩出版社。

[9] Convolutional neural network, https://en.wikipedia.org/wiki/Convolutional_neural_network.

簡易檢索 / 詳目顯示

相關論文