簡易檢索 / 詳目顯示

研究生: 吳天宇
Wu, Tian-Yu
論文名稱: 基於AlphaZero General Framework實現Breakthrough遊戲
On Implementing Breakthrough Game Based on AlphaZero General Framework
指導教授: 林順喜
Lin, Shun-Shii
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 78
中文關鍵詞: 電腦對局AlphaZero突圍棋類神經網路深度學習
英文關鍵詞: Computer games, AlphaZero, Breakthrough, Neural network, Deep learning
DOI URL: http://doi.org/10.6345/NTNU201900129
論文種類: 學術論文
相關次數: 點閱:305下載:51
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在現今人工智慧電腦對局領域中,多數棋類的頂尖程式,都以AlphaZero的開發框架獨占鰲頭,棋力遠超以往傳統的程式,然而此種架構中有許多研發內容並不因不同棋類的規則而有所不同,當需要研發新種類的對局程式時將會有許多重複的前置開發成本。
    故本論文中以C++實作遊戲規則及搜尋樹處理,以Python與TensorFlow套件實作類神經網絡訓練,兩者結合出易讀且運行效率較高的通用型AlphaZero框架的程式,此框架能夠讓使用者只需更改遊戲規則,即可開始AlphaZero的訓練模式。相較於GitHub相關開源碼中,Surag Nair先生全部以Python語言開發的alpha-zero-general程式,在突圍棋(Breakthrough)運行上,單執行緒速度效能可提升77.8%。
    此外,本論文另外實作並測試三個可能的改良方法,用於提升整體AlphaZero訓練流程的棋力。其修改點並不因不同棋類規則而有所不同,目的在於讓後續能套用至通用型AlphaZero框架的棋類也能夠受益。分別是對訓練資料進行增量的Replay方法、應用MMoE(Multi-Gate Mixture-of-Experts)類神經網路架構於AlphaZero中欲增強網路模型的預測能力,以及利用改良原版AlphaZero中如何贏得越快越好的Quick Win方法,將針對類神經網路的Label更改標記方式與蒙地卡羅樹搜尋演算法進行改良。

    In the field of artificial intelligence, many programs for computer games using AlphaZero approach outperform the other programs using traditional technics. However, we will have the similar and repeated development cost when starting from scratch to implement different game programs using AlphaZero framework.
    Our work is to implement an efficient and easy to use AlphaZero framework with C++ and Python programming languages. Users can start the whole AlphaZero training process immediately by only modifying the game module. Compared with the alpha-zero-general program written by Surag Nair in GitHub, we achieve 77.8% speedup in Breakthrough game.
    Further, we implement and test three possible improvements for AlphaZero approach. That includes the Replay method for augmented training data, the MMoE(Multi-Gate Mixture-of-Experts) method for enhancing neural network model, and the Quick Win method for learning how to win faster.

    第一章 緒論 1 1.1 研究背景 1 1.2 研究目的 5 第二章 文獻探討 6 2.1 類神經網路 6 2.2 蒙地卡羅樹搜索演算法 7 2.3 AlphaZero 8 2.4 MMoE 9 第三章 方法與步驟 12 3.1 研究對象 12 3.1.1 TensorFlow 12 3.1.2 Alpha-zero-general 14 3.1.3 Breakthrough 17 3.2 AZGC 程式設計 19 3.2.1 AZGC 訓練流程 20 3.2.2 AZGC 模型比較流程 22 3.2.3 AZGC 類別說明 23 3.2.4 AZGC 主程式說明 26 3.2.5 Bitboard in AZGC 29 3.3 Replay 資料增量於 AlphaZero 33 3.4 MMoE 於 AlphaZero 40 3.5 Quick Win 於 AlphaZero 45 3.5.1 Quick Win in MCTS 47 3.5.2 Quick Win in Network 51 3.5.3 Quick Win in MCTS and Network 54 第四章 實驗與結果 55 4.1 環境與參數設定 55 4.1.1 環境設定 55 4.1.2 參數設定 56 4.2 架構驗證 57 4.2.1 TAAI 2018 57 4.2.1 TCGA 2019 58 4.3 架構速度 60 4.4 應用 Replay 資料增量之效果 62 4.5 應用 MMoE 之效果 64 4.6 應用 Quick Win 之效果 65 4.6.1 Quick Win in MCTS 65 4.6.2 Quick Win in Network 69 4.6.3 Quick Win in MCTS and Network 72 第五章 結論與未來方向 74 參考文獻 76

    Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ..., and Kudlur, M. (2016). "Tensorflow: A System for Large-Scale Machine Learning", Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, pp. 265-283.

    Bansal, Trapit, David Belanger, and Andrew McCallum (2016). "Ask the Gru: Multi-task Learning for Deep Text Recommendations", Proceedings of the 10th ACM Conference on Recommender Systems.

    Broderick Arneson, Ryan Hayward, and Philip Henderson (2009). "MoHex Wins Hex Tournament", ICGA Journal, 32 (2): 114–116.

    Chang-Shing Lee, Mei-Hui Wang, Guillaume Chaslot, Jean-Baptiste Hoock, Arpad Rimmel, Olivier Teytaud, Shang-Rong Tsai, Shun-Chin Hsu, and Tzung-Pei Hong (2009). "The Computational Intelligence of MoGo Revealed in Taiwan’s Computer Go Tournaments", IEEE Transactions on Computational Intelligence and AI in Games, 1 (1): 73–89.

    Chaslot, G. M. J., Winands, M. H., HERIK, H. J. V. D., Uiterwijk, J. W., and Bouzy, B. (2008). "Progressive Strategies for Monte-Carlo Ttree Search", New Mathematics and Natural Computation, 4(03), 343-357.

    Chih-Hung Chen, Wei-Lin Wu, Yu-Heng Chen, and Shun-Shii Lin (2018). "Some Improvements in Monte Carlo Tree Search Algorithms for Sudden Death Games", ICGA Journal, vol. 40, no. 4, pp. 460-470.

    Duong, L., Cohn, T., Bird, S., and Cook, P. (2015). "Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser", Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. 2, pp. 845-850.

    Handscomb, K. (2001). "8× 8 Game Design Competition: The Winning Game: Breakthrough... and Two Other Favorites", Abstract Games Magazine, 7, 8-9.

    Ioffe, S., and Szegedy, C. (2015). "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", arXiv preprint arXiv:1502.03167.

    Isaac, A., and Lorentz, R. (2016). "Using Partial Tablebases in Breakthrough", Proceedings of the International Conference on Computers and Games, Springer, Cham, pp. 1-10.

    István Szita, Guillaume Chaslot, and Pieter Spronck (2009). "Monte-Carlo Tree Search in Settlers of Catan", Proceedings of the 12th International Conference of Advances in Computer Games, Pamplona, Spain.

    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ..., and Darrell, T. (2014). "Caffe: Convolutional Architecture for Fast Feature Embedding", Proceedings of the 22nd ACM international Conference on Multimedia, pp. 675-678.

    Jonathan Rubin, and Ian Watson (2011). "Computer Poker: A Review", Artificial Intelligence, 175 (5–6): 958–987.

    Kocsis, L. and Szepesvári, C. (2006). "Bandit Based Monte-carlo Planning", Proceedings of the European Conference on Machine Learning, Springer, Berlin, Heidelberg, pp. 282-293.

    Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). "Imagenet Classification with Deep Convolutional Neural Networks", Proceedings of the Advances in Neural Information Processing Systems, pp. 1097-1105.

    LeCun et al. (1989). "Backpropagation Applied to Handwritten Zip Code Recognition", Neural Computation, 1, pp. 541–551.

    Lorentz, R. and Horey, T. (2013). "Programming Breakthrough", Proceedings of the International Conference on Computers and Games, Springer, Cham, pp. 49-59.

    Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., and Chi, E. H. (2018). "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-experts", Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1930-1939.

    McCulloch, Warren and Walter Pitts (1943). "A Logical Calculus of Ideas Immanent in Nervous Activity", Bulletin of Mathematical Biophysics. 5 (4): 115–133.

    Rémi Coulom (2007). "Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search", Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, May 29–31, 2006. Revised Papers. H. Jaap van den Herik, Paolo Ciancarini, H. H. L. M. Donkers (eds.). Springer. 2007: 72–83.

    Rich Caruana (1998). "Multitask Learning", Learning to Learn. Springer, 95–133.

    Rosenblatt, F. (1958). "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain", Psychological Review, 65(6): 386–408.

    Rumelhart, David E., Hinton, Geoffrey E., Williams, and Ronald J. (1986). "Learning Representations by Back-propagating errors", Nature, 323 (6088): 533–536.

    Saffidine, A., Jouandeau, N., and Cazenave, T. (2011). "Solving Breakthrough with Race Patterns and Job-level Proof Number Search", Proceedings of the Advances in Computer Games, Springer, Berlin, Heidelberg, pp. 196-207.

    Schaeffer, J. (1989). "The History Heuristic and Alpha-beta Search Enhancements in Practice", IEEE Transactions on Pattern Analysis and Machine Intelligence, (11), 1203-1212.

    Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ..., and Dieleman, S. (2016). "Mastering the Game of Go with Deep Neural Networks and Tree Search", Nature, 529(7587), 484.

    Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., ..., and Lillicrap, T. (2017). "Mastering Chess and Shogi by Self-play with a General Reinforcement Learning Algorithm", arXiv preprint arXiv:1712.01815.

    Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ..., and Chen, Y. (2017). "Mastering the Game of Go without Human Knowledge", Nature, 550(7676), 354.

    Yang, Y. and Hospedales, T. (2016). "Deep Multi-task Representation Learning: A Tensor Factorisation Approach", arXiv preprint arXiv:1605.06391.

    https://en.wikipedia.org/wiki/Breakthrough_(board_game)

    https://github.com/suragnair

    https://github.com/thedataincubator/data-science-blogs/blob/master/deep-learning-libraries.md

    下載圖示
    QR CODE