研究生: |
蔡宜憲 Tsai, Yi-Sian |
---|---|
論文名稱: |
Multiple Policy Value MCTS 結合 Population Based Training 加強連四棋程式 Enhancing the Connect Four Program Through the Combination of Multiple Policy Value MCTS and Population Based Training |
指導教授: |
林順喜
Lin, Shun-Shii |
口試委員: |
林順喜
Lin, Shun-Shii 吳毅成 Wu, I-Chen 顏士淨 Yen, Shi-Jim 陳志昌 Chen, Jr-Chang 周信宏 Chou, Hsin-Hung |
口試日期: | 2023/07/20 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 47 |
中文關鍵詞: | 電腦對局 、連四棋 、深度學習 、AlphaZero 、Multiple Policy Value Monte Carlo Tree Search |
英文關鍵詞: | Computer Games, Connect Four, Deep Learning, AlphaZero, Multiple Policy Value Monte Carlo Tree Search |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202400125 |
論文種類: | 學術論文 |
相關次數: | 點閱:103 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
電腦對局是人工智慧在計算機科學和工程方面的最古老和最著名的應用之一,而AlphaZero在棋類對局領域是一個非常強大的強化學習算法。
AlphaZero是用了MCTS與深度神經網路結合的演算法。較大的神經網路在準確評估方面具有優勢,較小的神經網路在成本和效能方面具有優勢,在有限的預算下必須兩者取得平衡。Multiple Policy Value Monte Carlo Tree Search此方法結合了多個不同大小的神經網路,並保留每個神經網路的優勢。
本研究以Surag Nair先生在GitHub上的AlphaZero General程式做修改,加入Multiple Policy Value Monte Carlo Tree Search,並實現在連四棋遊戲上。另外在程式中使用了Multiprocessing來加快訓練速度。最後使用了Population Based Training的方式來尋找較佳的超參數。
Computer games are one of the oldest and most famous applications of artificial intelligence in computer science and engineering. AlphaZero is a very powerful reinforcement learning algorithm in the field of computer games.
AlphaZero combines the Monte Carlo Tree Search algorithm with deep neural networks. Larger neural networks have advantages in accurate evaluation, while smaller networks have advantages in cost as well as efficiency. Finding a balance between the two is crucial when working with limited budgets. The Multiple Policy Value Monte Carlo Tree Search combines multiple neural networks of different sizes, leveraging the advantages of each network.
In this study, we modified the AlphaZero General program written by Surag Nair on GitHub. We implemented the Multiple Policy Value Monte Carlo Tree Search and applied it to the Connect Four game. To accelerate the training process, we employed multiprocessing techniques in the program. Lastly, we used Population Based Training to search for better hyperparameters.
Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. and Hassabis, D. (2016). Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529(7587), pp.484-489.
Browne, C., Powley, E., Whitehouse, D., Lucas, S., Cowling, P., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S. and Colton, S. (2012). A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), pp.1-43.
Victor, A. (1988). A Knowledge-based Approach of Connect-Four: The Game is Solved: White Wins. Report IR-163, Vrije Universiteit Amsterdam.
Project AlphaZero General, https://github.com/suragnair/alpha-zero-general.
吳天宇 2019 基於 AlphaZero General Framework實現 Breakthrough遊戲 國立臺灣師範大學資工所碩士論文。
Lan, L.-C., Li, W., Wei, T.-H., Wu, I-C. (2019). Multiple Policy Value Monte Carlo Tree Search. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp.4704–4710.
Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W., Donahue, J., Razavi, A., Vinyals, O., Green, T., Dunning, I., Simonyan, K., Fernando, C., and Kavukcuoglu, K. (2017). Population Based Training of Neural Networks. https://arxiv.org/pdf/1711.09846.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., and Hassabis, D. (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. https://arxiv.org/pdf/1712.01815.
Wu, T.-R., Wei, T.-H., Wu, I-C. (2020). Accelerating and Improving AlphaZero Using Population Based Training. The Thirty-Fourth AAAI Conference on Artificial Intelligence.
Abraham, S., Peter B. G., Greg G. (2018). Operating System Concepts. tenth edition. John Wiley & Sons, Inc.
Connect Four遊戲網站 , https://connect4.gamesolver.org/.