研究生: |
饒鏞 Jao, Yung |
---|---|
論文名稱: |
MuZero 演算法結合連續獲勝走步改良外圍開局
五子棋程式 Combining MuZero Algorithm with Consecutive Winning Moves to Improve the Outer-Open Gomoku Program |
指導教授: |
林順喜
Lin, Shun-Shii |
口試委員: |
許舜欽
Hsu, Shun-Chin 吳毅成 Wu, I-Chen 顏士淨 Yen, Shi-Jim 陳志昌 Chen, Jr-Chang 張紘睿 Chang, Hung-Jui 林順喜 Lin, Shun-Shii |
口試日期: | 2022/08/03 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 47 |
中文關鍵詞: | MuZero 、神經網路 、迫著搜索 、連續獲勝走步 |
英文關鍵詞: | MuZero, Neural Network, Threats-Space Search, Consecutive Winning Moves |
研究方法: | 實驗設計法 、 比較研究 |
DOI URL: | http://doi.org/10.6345/NTNU202201075 |
論文種類: | 學術論文 |
相關次數: | 點閱:146 下載:37 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
2019年,DeepMind所開發的MuZero演算法使用「零知識」學習,將人工智慧帶往更加通用的研究領域。由於以此演算法所開發的Muzero-general原始版本外五棋程式,其模型訓練時只估計遊戲的結束狀態,增添了許多訓練時的不確定性,於是本研究嘗試以連續獲勝走步改良此外五棋程式。
迫著走步是外五棋遊戲當中非常重要的獲勝手段,連續獲勝走步則是在正確使用迫著走步後,所得出的獲勝走步。本研究透過連續獲勝走步原則,進一步以對局過程中是否有提供以迫著搜索得出之連續獲勝走步,以及不同的迫著搜索設計結合不同情況的連續獲勝走步獎勵,設計了三種不同的改良方法。
實驗結果表明,在相同的訓練時間下,三種方法均成功對原始版本進行改良,其中採用加入主動進攻走步之迫著搜索設計為棋力最強的方法。
關鍵詞 : MuZero、神經網路、迫著搜索、連續獲勝走步
In 2019, the MuZero algorithm developed by DeepMind used "no knowledge" learning to bring artificial intelligence to a more general research field. Since the original version of Muzero-general developed by this algorithm only estimates the ending state of the game during training, it adds a lot of uncertainty during training, so this study attempts to improve the Outer-Open Gomoku program with consecutive winning moves.
Using threat moves is a very important way to win in the game of Outer-Open Gomoku, and the consecutive winning moves are the winning moves obtained from the correct use of the threat moves. Through combining MuZero Algorithm with consecutive winning moves , this study further designs three different methods.
The experimental results show that, under the same training time, the three methods have all successfully improved the original version. Among them, the second one that the threat moves include the active offensive moves is the most powerful method.
Keywords: MuZero, Neural Network, Threats-Space Search, Consecutive Winning Moves
[1] IBM.com, IBM research pages on Deep Blue, https://www.research.ibm.com/deepblue/.
[2] D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis, "Mastering
the game of Go with deep neural networks and tree search," Nature, vol.
529(7587), pp.484-489, 2016.
[3] DeepMind, Google DeepMind Challenge Match: Lee Sedol vs AlphaGo, https://www.deepmind.com/research/highlighted-research/alphago/the-challenge-match.
[4] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis (2017). "Mastering the game of Go without human knowledge," Nature, vol. 550(7676), pp.354-359, 2017.
[5] Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. 46 Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, D. Silver, "Mastering atari, go, chess and shogi by planning with a learned model," Nature, vol. 588(7839), pp. 604–609, 2020.
[6] C. H. Chen, W. L. Wu, Y. H. Chen, and S. S. Lin, "Some improvements in Monte Carlo tree search algorithms for sudden death games," ICGA Journal, vol. 40, no. 4, pp. 460–470, 2019.
[7] H. Shuai, and H. He, "Online scheduling of a residential microgrid via Monte-Carlo tree search and a learned model," IEEE Transactions on Smart Grid, vol. 12(2),pp.1073-1087, 2021.
[8] L. V. Allis, "Searching for solutions in games and artificial intelligence," Ph.D. thesis, University of Limburg, Maastricht, The Netherlands, 1994.
[9] L. V. Allis, H. J. Herik, and M. P. H. Huntjens, "Go-Moku solved by new search techniques," Computational Intelligence, vol. 12, no. 1, pp. 7–23, 1996.
[10] S. S. Lin, and C. Y. Chen, "How to rescue Gomoku? The introduction
of Lin's new rule," (in Chinese) The 2012 Conference on Technologies and Applications of Artificial Intelligence (TAAI 2012), Tainan, Taiwan, 2012.
[11] L. V. Allis, H. van den Herik, and M. P. H. Huntjens, “Go-moku solved 47
by new search techniques,” Computation Intelligence, vol. 12, pp. 7-23, 1996.
[12] 劉浩萱,AlphaZero 演算法結合快贏策略或迫著空間實現於五子棋。國立臺灣師範大學碩士論文,2020。
[13] Muzero-general, https://github.com/werner-duvaud/muzero-general.
[14] Yixin, https://www.aiexp.info/pages/yixin.html.