國立臺灣師範大學博碩士論文全文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	江曛宇 Jiang, Syun-Yu
論文名稱：	利用啟發式法則與數種訓練策略來評估中國跳棋程式 Evaluating Chinese Checkers Programs Using Heuristics and Several Training Strategies
指導教授：	林順喜 Lin, Shun-Shii
口試委員：	吳毅成 Wu, I-Chen 顏士淨 Yen, Shi-Jim 陳志昌 Chen, Jr-Chang 周信宏 Chou, Hsin-Hung 林順喜 Lin, Shun-Shii
口試日期：	2023/06/28
學位類別：	碩士 Master
系所名稱：	資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	75
中文關鍵詞：	電腦對局、中國跳棋、蒙地卡羅樹搜索法、深度學習、強化學習、啟發式法則
英文關鍵詞：	Computer Games, Chinese Checkers, Monte Carlo Tree Search, Deep Learning, Reinforcement Learning, Heuristics
研究方法:	實驗設計法
DOI URL：	http://doi.org/10.6345/NTNU202301091
論文種類：	學術論文
相關次數：	點閱：496 下載：5
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

中國跳棋(Chinese Checkers)是一個知名且充滿挑戰性的完全資訊遊戲。與一些其他的傳統遊戲如五子棋、圍棋不同，賽局樹的搜索空間並不會隨著遊戲的進行而越來越小。若是單純使用AlphaZero架構之演算法，在短時間內甚至難以訓練出初學者程度之程式。過去雖有使用蒙地卡羅樹搜索法結合深度學習與強化學習，並應用於中國跳棋上的演算法，但是仍有改進的空間。若是能夠適當的加入一些中國跳棋的先備知識，應該能使棋力進一步的提升。
本研究針對中國跳棋設計數種策略，修改了前代程式Jump的設計，人為的增加先備知識，以期有更好的棋力，並且針對中國跳棋在神經網路訓練初期棋力很弱的問題，提出一連串的解決方案與策略，使其能夠在不使用人為訓練資料以及預訓練的狀況下，能夠獲得一定的棋力，並且對這些策略的特點進行探討，分析出各個策略的優缺點。

Chinese Checkers is a well-known and challenging board game with perfect information. Unlike some other traditional games, such as Gomoku and Go, the search space of the game tree does not decrease as the game progresses. In the past, Monte Carlo Tree Search combining deep learning and reinforcement learning was used in some Chinese Checkers programs, but there’s still room for improvement. If some heuristics of Chinese Checkers can be properly added, it should be able to further improve the strength.
In this work, we present an approach that combines Monte Carlo Tree Search, deep learning, and reinforcement learning with several heuristic methods. We modified the predecessor program Jump, and the heuristics were manually investigated in order to improve its strength. Furthermore, a series of strategies are proposed to solve the training problem when the neural network is not precise in the early stage of training without any hand-made training data and without pre-training. We analyze and discuss the advantages and disadvantages of each strategy.

摘要 i
Abstract ii
致謝 iii
目錄 iv
圖目錄 v
表目錄 vii
第一章 緒論 1
1.1 研究背景 1
1.2 研究目的 2
第二章 文獻探討 5
2.1 背景 5
2.2 AlphaZero 12
2.3 Negentropy 14
2.4 Graph Algorithms 15
2.5 Heuristics Combined with Deep Reinforcement Learning 16
2.6 Quality-based Rewards 16
2.7 Jump 17
2.8相關程式的問題 23
第三章 研究方法 26
3.1 初期設計 26
3.2跳躍策略(jump)設計 28
3.3群聚策略(couple)設計 35
3.4 快贏策略(quick)設計 38
3.5 兩階段訓練 39
3.6 非線性跳躍(nonlinear jump)策略 40
第四章 實驗結果 43
4.1 實驗設計 43
4.2 挑選第一階段策略 44
4.3 第二階段策略之比較 45
4.4 單一策略分析 58
第五章 結論與未來展望 70
參考文獻 72
                                

Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. and Hassabis, D. (2016). Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529(7587), pp.484-489.

Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T. and Hassabis, D. (2017). Mastering the Game of Go without Human Knowledge. Nature, 550(7676), pp.354-359.

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K. and Hassabis, D. (2018). Mastering Chess and Shogi by Self-play with a General Reinforcement Learning Algorithm. [online] Arxiv.org. Available at https://arxiv.org/abs/1712.01815.

Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., Lillicrap, T. and Silver, D. (2020). Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Nature, 588, pp. 604-609.

Chinese Checkers Introduction, https://en.wikipedia.org/wiki/Chinese_checkers.

陳俊豪，陳志宏，林順喜，中國跳棋程式之研發與自我強化學習之探討，國立臺灣師範大學碩士論文，台北，台灣，2019。

陳律濃，林順喜，陳志宏，中國跳棋對弈平台與AI的實作，國立臺灣師範大學碩士論文，台北，台灣，2019。

AlphaZero General, Created by Surag Nair, https://github.com/suragnair/alpha-zero-general.

李峻丞，韓永楷，中國跳棋的性質及其相關問題之研究，國立清華大學碩士論文，新竹，台灣，2022。

Liu, Z., Zhou, M., Cao, W., Qu, Q., Yeung, H. W. F., Chung, V. Y. Y. (2019) Towards Understanding Chinese Checkers with Heuristics, Monte Carlo Tree Search, and Deep Reinforcement Learning. The University of Sydney.

Pepels, T., Tak, M. J., Lanctot, M., Winands, M.H.M. (2014) Quality-based Rewards for Monte Carlo Tree Search Simulations. 21st European Conference on Artificial Intelligence, Prague, Czech.

Hsueh, C.-S., Wu, I-C., Tseng, W.-J., Yen, S.-J., Chen, J.-C. (2016) An Analysis for Strength Improvement of an MCTS-based Program Playing Chinese Dark Chess. Theoretical Computer Science, 644, pp. 63-75.

He, K., Zhang, X., Ren, S., and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770-778.

Temporal-Difference Learning Introduction,
https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf.

Chen, C.-H., Wu, W.-L., Chen,Y.-H. and Lin, S.-S. (2018) Some Improvements in Monte Carlo Tree Search Algorithms for Sudden Death Games. The 10th International Conference on Computers and Games. Taipei, Taiwan.

陳志宏，Improving the AlphaZero Algorithm in the Playing and Training Phases，國立臺灣師範大學資工所博士論文，台北，台灣，2022。

Bayesian Elo Rating, https://www.remi-coulom.fr/Bayesian-Elo/

簡易檢索 / 詳目顯示

相關論文