國立臺灣師範大學博碩士論文全文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	張乃元 Chang, Nai-Yuan
論文名稱：	改進AlphaZero的大贏策略並應用於黑白棋 The Big Win Strategy: An improvement over AlphaZero approach for Othello
指導教授：	林順喜 Lin, Shun-Shii
學位類別：	碩士 Master
系所名稱：	資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	中文
論文頁數：	52
中文關鍵詞：	電腦對局、黑白棋、蒙地卡羅法、神經網路、深度學習
DOI URL：	http://doi.org/10.6345/NTNU201900357
論文種類：	學術論文
相關次數：	點閱：541 下載：62
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

DeepMind的AlphaZero演算法在電腦遊戲對局領域中取得了巨大的成功，在許多具有挑戰性的遊戲中都取得了超越人類的表現，但是我們認為AlphaZero演算法中仍然有可以改進的地方。
AlphaZero演算法只估計遊戲的輸贏或是平手，而忽略了最後可能會獲得多少分數。而在像是圍棋或是黑白棋這類的佔地型遊戲中，最後所得到的分數往往會相當大地左右遊戲的勝負，於是我們提出大贏策略：在AlphaZero演算法中加入對於分數的判斷，來改進演算法的效率。
在本研究中使用8路黑白棋作為實驗大贏策略效果的遊戲，我們使用並且修改網路上一個實作AlphaZero演算法的開源專案：alpha-zero-general來進行我們的實驗。經過我們的實驗之後，使用大贏策略的模型相比未使用的原始AlphaZero模型，在經過100個迭代的訓練之後有著高達78%的勝率，證明大贏策略對於AlphaZero演算法有著十分顯著的改進效益。

DeepMind's AlphaZero algorithm has achieved great success in the field of computer game, and has surpassed human performance in many challenging games, but we believe there still has some point for improvement in the AlphaZero algorithm.
The AlphaZero algorithm only estimates whether the game wins or loses, and ignores how many points may be obtained in the end. In a land-based game like Go or Othello, the final score will tend to be quite a big game. So we propose Big Win Strategy: add the judgment of the score in the AlphaZero algorithm. To improve the efficiency of the algorithm.
In this paper, we used 8x8 Othello as the game for the Big Win Strategy. We used and modified an open source project on the Internet that implemented the AlphaZero algorithm: alpha-zero-general for our experiments. After our experiments, the model using the Big Win Strategy has a winning rate of 78% after 100 iterations compared to the original AlphaZero model, which proves that the Big Win Strategy has significant improvement benefits for the AlphaZero algorithm.

第一章 緒論    1
1.1 研究背景    1
1.2 研究目的    2
1.3 研究意義    3

第二章 文獻探討    5
2.1 背景    5
2.2 相關程式架構    6

第三章 方法與步驟    15
3.1 alpha-zero-general    15
3.2 大贏策略    29
3.3 修改神經網路訓練的Value值    30
3.4 神經網路結構    31
3.5 蒙地卡羅樹搜索    32

第四章 大贏策略    33
4.1 多重價值網路(Multi-Value Network)    33
4.2 Value*    36

第五章 實驗與結果    38
5.1 多重價值網路方法    38
5.2  Value*方法    39

第六章 結論與未來工作    42

參考文獻    44
附錄一：論文投稿至國外研討會CMECE 2018    46
附錄二：參與比賽的經驗    51
                                

[1] Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. and Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), pp.484-489.
[2] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T. and Hassabis, D. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), pp.354-359.
[3] Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K. and Hassabis, D. (2018). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. https://arxiv.org/abs/1712.01815 [Accessed 10 Jul. 2018].
[4] Knuth, D. E. & Moore, R. W. An analysis of alpha-beta pruning. Artif. Intell. 6, 293–326 (1975)
[5] Browne, C., Powley, E., Whitehouse, D., Lucas, S., Cowling, P., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S. and Colton, S. (2012). A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), pp.1-43.
[6] Coulom R. (2007) Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In：van den Herik H.J., Ciancarini P., Donkers H.H.L.M.. (eds) Computers and Games. CG 2006. Lecture Notes in Computer Science, vol 4630. Springer, Berlin, Heidelberg.
[7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, arXiv：1512.03385v1 [cs.CV] 10 Dec 2015.
[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Identity Mappings in Deep Residual Networks, arXiv：1603.05027v3 [cs.CV] 25 Jul 2016.
[9] Convolutional neural network, https://en.wikipedia.org/wiki/ Convolutional_neural_network
[10] Project of AlphaZero General, https://github.com/suragnair/ alpha-zero-general.
[11] A Simple Alpha(Go) Zero Tutorial, http://web.stanford.edu/~surag/posts/ alphazero.html.
[12] Learning to Play Othello Without Human Knowledge, http://web.stanford.edu/~surag/posts/alphazero.html
[13] Nai-Yuan Chang, Chih-Hung Chen, Shun-Shii Lin, Surag Nair, The Big Win Strategy on Multi-Value Network：An Improvement over AlphaZero Approach for 6x6 Othello, MLMI2018 Proceedings of the 2018 International Conference on Machine Learning and Machine Intelligence, Pages 78-81.

簡易檢索 / 詳目顯示

相關論文