簡易檢索 / 詳目顯示

研究生: 蔡宜憲
Tsai, Yi-Sian
論文名稱: Multiple Policy Value MCTS 結合 Population Based Training 加強連四棋程式
Enhancing the Connect Four Program Through the Combination of Multiple Policy Value MCTS and Population Based Training
指導教授: 林順喜
Lin, Shun-Shii
口試委員: 林順喜
Lin, Shun-Shii
吳毅成
Wu, I-Chen
顏士淨
Yen, Shi-Jim
陳志昌
Chen, Jr-Chang
周信宏
Chou, Hsin-Hung
口試日期: 2023/07/20
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 47
中文關鍵詞: 電腦對局連四棋深度學習AlphaZeroMultiple Policy Value Monte Carlo Tree Search
英文關鍵詞: Computer Games, Connect Four, Deep Learning, AlphaZero, Multiple Policy Value Monte Carlo Tree Search
研究方法: 實驗設計法
DOI URL: http://doi.org/10.6345/NTNU202400125
論文種類: 學術論文
相關次數: 點閱:103下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 電腦對局是人工智慧在計算機科學和工程方面的最古老和最著名的應用之一,而AlphaZero在棋類對局領域是一個非常強大的強化學習算法。
    AlphaZero是用了MCTS與深度神經網路結合的演算法。較大的神經網路在準確評估方面具有優勢,較小的神經網路在成本和效能方面具有優勢,在有限的預算下必須兩者取得平衡。Multiple Policy Value Monte Carlo Tree Search此方法結合了多個不同大小的神經網路,並保留每個神經網路的優勢。
    本研究以Surag Nair先生在GitHub上的AlphaZero General程式做修改,加入Multiple Policy Value Monte Carlo Tree Search,並實現在連四棋遊戲上。另外在程式中使用了Multiprocessing來加快訓練速度。最後使用了Population Based Training的方式來尋找較佳的超參數。

    Computer games are one of the oldest and most famous applications of artificial intelligence in computer science and engineering. AlphaZero is a very powerful reinforcement learning algorithm in the field of computer games.
    AlphaZero combines the Monte Carlo Tree Search algorithm with deep neural networks. Larger neural networks have advantages in accurate evaluation, while smaller networks have advantages in cost as well as efficiency. Finding a balance between the two is crucial when working with limited budgets. The Multiple Policy Value Monte Carlo Tree Search combines multiple neural networks of different sizes, leveraging the advantages of each network.
    In this study, we modified the AlphaZero General program written by Surag Nair on GitHub. We implemented the Multiple Policy Value Monte Carlo Tree Search and applied it to the Connect Four game. To accelerate the training process, we employed multiprocessing techniques in the program. Lastly, we used Population Based Training to search for better hyperparameters.

    第一章 緒論 1 1.1 研究背景 1 1.2 研究目的 3 1.3 研究意義 3 第二章 文獻探討 4 2.1 連四棋 4 2.2 Monte Carlo Tree Search 7 2.3 AlphaZero 9 2.3.1 AlphaZero的架構 9 2.3.2 AlphaZero的 MCTS 10 2.4 Multiple Policy Value Monte Carlo Tree Search 11 2.5 Population Based Training 15 2.6 Multiprocessing 19 第三章 方法與步驟 21 3.1 AlphaZero General 21 3.2 MPV-MCTS應用於 AZG 24 3.3 Multiprocessing加速訓練 27 3.4 PBT應用於 MPV-AZG 28 第四章 實驗與結果 32 4.1 環境與參數設定 32 4.2 實驗設計 34 4.3 實驗結果 36 4.3.1 MPV-AZG不同神經網路比例版本比較 36 4.3.2 MPV-AZG版本與AZG版本比較 37 4.3.3 Multiprocessing加速MPV-AZG版本 39 4.3.4 PBT-AZG版本與MP-AZG版本比較 41 第五章 結論與未來工作 44 參考文獻 46

    Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. and Hassabis, D. (2016). Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529(7587), pp.484-489.
    Browne, C., Powley, E., Whitehouse, D., Lucas, S., Cowling, P., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S. and Colton, S. (2012). A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), pp.1-43.
    Victor, A. (1988). A Knowledge-based Approach of Connect-Four: The Game is Solved: White Wins. Report IR-163, Vrije Universiteit Amsterdam.
    Project AlphaZero General, https://github.com/suragnair/alpha-zero-general.
    吳天宇 2019 基於 AlphaZero General Framework實現 Breakthrough遊戲 國立臺灣師範大學資工所碩士論文。
    Lan, L.-C., Li, W., Wei, T.-H., Wu, I-C. (2019). Multiple Policy Value Monte Carlo Tree Search. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp.4704–4710.
    Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W., Donahue, J., Razavi, A., Vinyals, O., Green, T., Dunning, I., Simonyan, K., Fernando, C., and Kavukcuoglu, K. (2017). Population Based Training of Neural Networks. https://arxiv.org/pdf/1711.09846.
    Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., and Hassabis, D. (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. https://arxiv.org/pdf/1712.01815.
    Wu, T.-R., Wei, T.-H., Wu, I-C. (2020). Accelerating and Improving AlphaZero Using Population Based Training. The Thirty-Fourth AAAI Conference on Artificial Intelligence.
    Abraham, S., Peter B. G., Greg G. (2018). Operating System Concepts. tenth edition. John Wiley & Sons, Inc.
    Connect Four遊戲網站 , https://connect4.gamesolver.org/.

    無法下載圖示 電子全文延後公開
    2025/02/01
    QR CODE