研究生: |
吳天宇 Wu, Tian-Yu |
---|---|
論文名稱: |
基於AlphaZero General Framework實現Breakthrough遊戲 On Implementing Breakthrough Game Based on AlphaZero General Framework |
指導教授: |
林順喜
Lin, Shun-Shii |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 78 |
中文關鍵詞: | 電腦對局 、AlphaZero 、突圍棋 、類神經網路 、深度學習 |
英文關鍵詞: | Computer games, AlphaZero, Breakthrough, Neural network, Deep learning |
DOI URL: | http://doi.org/10.6345/NTNU201900129 |
論文種類: | 學術論文 |
相關次數: | 點閱:193 下載:50 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在現今人工智慧電腦對局領域中,多數棋類的頂尖程式,都以AlphaZero的開發框架獨占鰲頭,棋力遠超以往傳統的程式,然而此種架構中有許多研發內容並不因不同棋類的規則而有所不同,當需要研發新種類的對局程式時將會有許多重複的前置開發成本。
故本論文中以C++實作遊戲規則及搜尋樹處理,以Python與TensorFlow套件實作類神經網絡訓練,兩者結合出易讀且運行效率較高的通用型AlphaZero框架的程式,此框架能夠讓使用者只需更改遊戲規則,即可開始AlphaZero的訓練模式。相較於GitHub相關開源碼中,Surag Nair先生全部以Python語言開發的alpha-zero-general程式,在突圍棋(Breakthrough)運行上,單執行緒速度效能可提升77.8%。
此外,本論文另外實作並測試三個可能的改良方法,用於提升整體AlphaZero訓練流程的棋力。其修改點並不因不同棋類規則而有所不同,目的在於讓後續能套用至通用型AlphaZero框架的棋類也能夠受益。分別是對訓練資料進行增量的Replay方法、應用MMoE(Multi-Gate Mixture-of-Experts)類神經網路架構於AlphaZero中欲增強網路模型的預測能力,以及利用改良原版AlphaZero中如何贏得越快越好的Quick Win方法,將針對類神經網路的Label更改標記方式與蒙地卡羅樹搜尋演算法進行改良。
In the field of artificial intelligence, many programs for computer games using AlphaZero approach outperform the other programs using traditional technics. However, we will have the similar and repeated development cost when starting from scratch to implement different game programs using AlphaZero framework.
Our work is to implement an efficient and easy to use AlphaZero framework with C++ and Python programming languages. Users can start the whole AlphaZero training process immediately by only modifying the game module. Compared with the alpha-zero-general program written by Surag Nair in GitHub, we achieve 77.8% speedup in Breakthrough game.
Further, we implement and test three possible improvements for AlphaZero approach. That includes the Replay method for augmented training data, the MMoE(Multi-Gate Mixture-of-Experts) method for enhancing neural network model, and the Quick Win method for learning how to win faster.
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ..., and Kudlur, M. (2016). "Tensorflow: A System for Large-Scale Machine Learning", Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, pp. 265-283.
Bansal, Trapit, David Belanger, and Andrew McCallum (2016). "Ask the Gru: Multi-task Learning for Deep Text Recommendations", Proceedings of the 10th ACM Conference on Recommender Systems.
Broderick Arneson, Ryan Hayward, and Philip Henderson (2009). "MoHex Wins Hex Tournament", ICGA Journal, 32 (2): 114–116.
Chang-Shing Lee, Mei-Hui Wang, Guillaume Chaslot, Jean-Baptiste Hoock, Arpad Rimmel, Olivier Teytaud, Shang-Rong Tsai, Shun-Chin Hsu, and Tzung-Pei Hong (2009). "The Computational Intelligence of MoGo Revealed in Taiwan’s Computer Go Tournaments", IEEE Transactions on Computational Intelligence and AI in Games, 1 (1): 73–89.
Chaslot, G. M. J., Winands, M. H., HERIK, H. J. V. D., Uiterwijk, J. W., and Bouzy, B. (2008). "Progressive Strategies for Monte-Carlo Ttree Search", New Mathematics and Natural Computation, 4(03), 343-357.
Chih-Hung Chen, Wei-Lin Wu, Yu-Heng Chen, and Shun-Shii Lin (2018). "Some Improvements in Monte Carlo Tree Search Algorithms for Sudden Death Games", ICGA Journal, vol. 40, no. 4, pp. 460-470.
Duong, L., Cohn, T., Bird, S., and Cook, P. (2015). "Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser", Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. 2, pp. 845-850.
Handscomb, K. (2001). "8× 8 Game Design Competition: The Winning Game: Breakthrough... and Two Other Favorites", Abstract Games Magazine, 7, 8-9.
Ioffe, S., and Szegedy, C. (2015). "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", arXiv preprint arXiv:1502.03167.
Isaac, A., and Lorentz, R. (2016). "Using Partial Tablebases in Breakthrough", Proceedings of the International Conference on Computers and Games, Springer, Cham, pp. 1-10.
István Szita, Guillaume Chaslot, and Pieter Spronck (2009). "Monte-Carlo Tree Search in Settlers of Catan", Proceedings of the 12th International Conference of Advances in Computer Games, Pamplona, Spain.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ..., and Darrell, T. (2014). "Caffe: Convolutional Architecture for Fast Feature Embedding", Proceedings of the 22nd ACM international Conference on Multimedia, pp. 675-678.
Jonathan Rubin, and Ian Watson (2011). "Computer Poker: A Review", Artificial Intelligence, 175 (5–6): 958–987.
Kocsis, L. and Szepesvári, C. (2006). "Bandit Based Monte-carlo Planning", Proceedings of the European Conference on Machine Learning, Springer, Berlin, Heidelberg, pp. 282-293.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). "Imagenet Classification with Deep Convolutional Neural Networks", Proceedings of the Advances in Neural Information Processing Systems, pp. 1097-1105.
LeCun et al. (1989). "Backpropagation Applied to Handwritten Zip Code Recognition", Neural Computation, 1, pp. 541–551.
Lorentz, R. and Horey, T. (2013). "Programming Breakthrough", Proceedings of the International Conference on Computers and Games, Springer, Cham, pp. 49-59.
Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., and Chi, E. H. (2018). "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-experts", Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1930-1939.
McCulloch, Warren and Walter Pitts (1943). "A Logical Calculus of Ideas Immanent in Nervous Activity", Bulletin of Mathematical Biophysics. 5 (4): 115–133.
Rémi Coulom (2007). "Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search", Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, May 29–31, 2006. Revised Papers. H. Jaap van den Herik, Paolo Ciancarini, H. H. L. M. Donkers (eds.). Springer. 2007: 72–83.
Rich Caruana (1998). "Multitask Learning", Learning to Learn. Springer, 95–133.
Rosenblatt, F. (1958). "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain", Psychological Review, 65(6): 386–408.
Rumelhart, David E., Hinton, Geoffrey E., Williams, and Ronald J. (1986). "Learning Representations by Back-propagating errors", Nature, 323 (6088): 533–536.
Saffidine, A., Jouandeau, N., and Cazenave, T. (2011). "Solving Breakthrough with Race Patterns and Job-level Proof Number Search", Proceedings of the Advances in Computer Games, Springer, Berlin, Heidelberg, pp. 196-207.
Schaeffer, J. (1989). "The History Heuristic and Alpha-beta Search Enhancements in Practice", IEEE Transactions on Pattern Analysis and Machine Intelligence, (11), 1203-1212.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ..., and Dieleman, S. (2016). "Mastering the Game of Go with Deep Neural Networks and Tree Search", Nature, 529(7587), 484.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., ..., and Lillicrap, T. (2017). "Mastering Chess and Shogi by Self-play with a General Reinforcement Learning Algorithm", arXiv preprint arXiv:1712.01815.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ..., and Chen, Y. (2017). "Mastering the Game of Go without Human Knowledge", Nature, 550(7676), 354.
Yang, Y. and Hospedales, T. (2016). "Deep Multi-task Representation Learning: A Tensor Factorisation Approach", arXiv preprint arXiv:1605.06391.
https://en.wikipedia.org/wiki/Breakthrough_(board_game)
https://github.com/suragnair
https://github.com/thedataincubator/data-science-blogs/blob/master/deep-learning-libraries.md