簡易檢索 / 詳目顯示

研究生: 黃士傑
Shih-Chieh Huang
論文名稱: 應用於電腦圍棋之蒙地卡羅樹搜尋法的新啟發式演算法
New Heuristics for Monte Carlo Tree Search Applied to the Game of Go
指導教授: 林順喜
Lin, Shun-Shii
Rémi Coulom
Rémi Coulom
學位類別: 博士
Doctor
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 91
中文關鍵詞: 人工智慧圍棋電腦圍棋蒙地卡羅樹搜尋樹狀結構信賴上界法模擬平衡化時間控制Erica
英文關鍵詞: Artificial Intelligence, Go, computer Go, Monte Carlo Tree Search (MCTS), Upper Confidence bounds applied to Trees (UCT), Simulation Balancing, Time Management, Erica
論文種類: 學術論文
相關次數: 點閱:5794下載:938
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 電腦圍棋的研究開始於1970年,但圍棋程式卻從未曾被人們認為是強大的,直到2006年,當「蒙地卡羅樹搜尋」(Monte Carlo Tree Search)與「樹狀結構信賴上界法」(Upper Confidence bounds applied to Trees)出現之後,情況才開始完全不同。「蒙地卡羅樹搜尋」與「樹狀結構信賴上界法」所帶進的革命強而有力到一個地步,人們甚至開始相信,圍棋程式在10年或者20年之後,將能夠擊敗頂尖的人類棋手。
    在本研究中,我們針對「蒙地卡羅樹搜尋」提出一些新的啟發式演算法,主要有兩方面的貢獻。第一個貢獻,是成功的將「模擬平衡化」(Simulation Balancing)應用到9路圍棋。「模擬平衡化」是一種用來訓練模擬的參數的演算法。Silver與Tesauro在2009年提出這個方法時,只實驗在比較小的盤面上,而我們的實驗結果首先證明了「模擬平衡化」在9路圍棋的有效性,具體方法是證明「模擬平衡化」超越了知名的監督式演算法Minorization-Maximization (MM)大約有90 Elo之多。第二個貢獻是針對19路圍棋,系統式的實驗了各種不同之時間控制的方法。實驗結果清楚的指明,聰明的時間控制方案可以大大的提高棋力。所有的實驗都是執行在我們的圍棋程式ERICA,而ERICA正是得益於這些啟發式演算法與實驗結果,成功取得了2010年電腦奧林匹亞的19路圍棋金牌。

    Research into computer Go started around 1970, but the Go-playing programs were never, in a real sense, considered to be strong until the year 2006, when the brand new search scheme Monte Carlo Tree Search (MCTS) and Upper Confidence bounds applied to Trees (UCT) appeared on the scene. The revolution of MCTS and UCT promoted progress of computer Go to such a degree that people began to believe that after ten or twenty years, Go-playing programs will be able to defeat the top human players.
    In this research, we propose some new heuristics of MCTS focused on two contributions. The first contribution is the successful application of Simulation Balancing (SB), an algorithm for training the parameters of the simulation, to 9×9 Go. SB was proposed by Silver and Tesauro in 2009, but it was only practiced on small board sizes. Our experiments are the first to demonstrate its effectiveness in 9×9 Go by showing that SB surpasses the well-known supervised learning algorithm Minorization-Maximization (MM) by about 90 Elo. The second contribution is systematic experiments of various time management schemes for 19×19 Go. The results indicate that clever time management algorithms can considerably improve playing strength. All the experiments were performed on our Go-playing program ERICA, which benefitted from these heuristics and the experimental results to win the gold medal in the 19×19 Go tournament at the 2010 Computer Olympiad.

    誌謝 I Acknowledgement II 摘要 IV Abstract V Contents VI List of Figures X List of Tables XII Chapter 1 Introduction 1 1.1 Computer Games 1 1.2 The Game of Go 2 1.2.1 History 2 1.2.2 Rules 3 1.3 Computer Go 6 1.4 Summary of the Contributions 8 1.5 Organization of the Dissertation 9 Chapter 2 Background and Related Work 10 2.1 Monte Carlo Go 10 2.2 Monte Carlo Tree Search (MCTS) 11 2.2.1 Selection 12 2.2.2 Expansion 12 2.2.3 Simulation 13 2.2.4 Backpropagation 15 2.3 Upper Confidence Bound Applied to Trees (UCT) 18 2.4 State-of-the-Art Go-Playing Programs 21 2.4.1 Crazy Stone 21 2.4.2 MOGO 23 2.4.3 GNU GO 26 2.4.4 FUEGO 28 2.4.5 The Many Faces of Go 29 2.4.6 ZEN 30 2.4.7 Other Programs 33 Chapter 3 ERICA 34 3.1 Development History 34 3.1.1 First Version Created on May 2008 34 3.1.2 Second Version Created on June 2009 36 3.1.3 Third Version Created on February 2010 38 3.2 MCTS in ERICA 40 3.2.1 Selection 40 3.2.2 Expansion 41 3.2.2.1 Larger Patterns 41 3.2.2.2 Other Features 42 3.2.3 Simulation 42 3.2.3.1 Boltzmann Softmax Playout Policy 42 3.2.3.2 Move Generator 44 3.2.3.3 ForbiddenMove 45 3.2.3.4 ReplaceMove 46 3.2.4 Backpropagation 46 3.2.4.1 Bias RAVE Updates by Move Distance 46 3.2.4.2 Fix RAVE Updates for Ko Threats 47 3.3 KGS Games of ERICA 49 Chapter 4 Monte Carlo Simulation Balancing Applied to 9×9 Go 52 4.1 Introduction 52 4.2 Description of Algorithms 54 4.2.1 Softmax Policy 54 4.2.2 Supervised Learning with MM 54 4.2.3 Policy-Gradient Simulation Balancing (SB) 55 4.3 Experiments 56 4.3.1 ERICA 56 4.3.2 Playout Features 56 4.3.3 Experimental Setting 58 4.3.4 Results and Influence of Meta-Parameters 59 4.4 Comparison between MM and SB Feature Weights 61 4.5 Against GNU Go on the 9×9 Board 63 4.6 Playing Strength on the 19×19 Board 65 4.7. Conclusions 65 Chapter 5 Time Management for Monte Carlo Tree Search Applied to the Game of Go 67 5.1 Introduction 67 5.2 Monte Carlo Tree Search in ERICA and Experiment Setting 68 5.3 Basic Formula 69 5.4 Enhanced Formula Depending on Move Number 70 5.5 Some Heuristics 72 5.5.1 UCT Formula in ERICA 72 5.5.2 Unstable-Evaluation Heuristic 72 5.5.3 Think Longer When Behind 73 5.6 Using Opponent’s Time 74 5.6.1 Standard Pondering 75 5.6.2 Focused Pondering 75 5.6.3 Reducing ThinkingTime According to the Simulation Percentage 77 5.7 Conclusions 78 Chapter 6 Conclusions and Proposals for Future Work 79 6.1 Simulation Balancing (SB) 79 6.2 Time Management 80 6.3 Other Prospects 80 References 82 Appendix A. Publication List 91

    Abramson, B. (1990). Expected-Outcome: A General Model of Static Evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12, No. 2, pp. 182-193.

    Allis, V. (1994). Searching for Solutions in Games and Artificial Intelligence. Ph.D. Thesis, University of Limburg, Masstricht, The NetherLands.

    Anderson, D.A. (2009). Monte Carlo search in Games. Technical report, Worcester Polytechnic Institute.

    Audouard, P., Chaslot, G., Hoock, J.-B., Rimmel, A., Perez, J., and Teytaud, O. (2009). Grid coevolution for adaptive simulations; application to the building of opening books in the game of Go. In EvoGames, Tuebingen Allemagne. Springer.

    Auer, P., Cesa-Bianchi, N., Freund, Y. and Schapire, R. E. (1995). Gambling in a rigged casino: the adversarial multi-armed bandit problem. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 322-331, IEEE Computer Society Press, Los Alamitos, CA.

    Auer, P., Cesa-Bianchi, N. and Fischer, P. (2002). Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning Journal, 47, pp. 235-256.

    Baier, H. and Drake, P. (2010). The Power of Forgetting: Improving the Last-Good-Reply Policy in Monte Carlo Go. IEEE Transactions on Computational Intelligence and AI in Games, Vol. 2, pp. 303-309.

    Baudis, P. (2011). Exploration formulas for UCT. Computer-Go mailing list,
    http://www.mail-archive.com/computer-go@dvandva.org/msg02154.html.
    Retrieved at 2011-07-17 T19:45:45+08:00.

    Baudis, P. and Gailly, J.-l. (2011). Pachi: Software for the Board Game of Go / Weiqi / Baduk.
    http://pachi.or.cz/.
    Retrieved at 2011-07-14 T14:38:10+08:00.

    Baum, E. B. and Smith, W. D. (1997). A Bayesian Approach to Relevance in Game Playing. Artificial Intelligence, Vol. 97, No.1–2, pp. 195-242.

    Bourki, A., Chaslot, G., Coulm, M., Danjean, V., Doghmen, H., Hérault, T., Hoock, J.-B., Rimmel, A., Teytaud, F., Teytaud, O., Vayssière, P., and Yu, Z. (2010). Scalability and Parallelization of Monte-Carlo Tree Search. Proceedings of the 5th International Conference on Computer and Games 2010.

    Bouzy, B. (2003). Associating domain-dependent knowledge and Monte Carlo approaches within a Go program. Information Sciences, Vol. 175, pp. 247-257.

    Bouzy, B. (2005a). Move Pruning Techniques for Monte-Carlo Go. In Joint Conference on Information Sciences. Proceedings of the 11th Advances in Computer Games conference (ACG-11), pp. 104-119.

    Bouzy, B. (2005b). History and territory heuristics for Monte-Carlo Go. In Heuristic Search and Computer Game Playing Session, Joint Conference on Information Sciences, Salt Lake City.

    Bouzy, B. and Cazenave, T. (2001). Computer Go: an AI-oriented Survey. Artificial Intelligence, Vol. 132, issues 1, pp. 39-103.

    Bouzy, B. and Chaslot, G. (2005). Bayesian generation and integration of K-nearest-neighbor patterns for 19×19 Go. IEEE 2005 Symposium on Computational Intelligence in Games (eds. G. Kendall & Simon Lucas), pp. 176-181.

    Bouzy, B. and Chaslot, G. (2006). Monte-Carlo Go Reinforcement Learning Experiments. 2006 IEEE Symposiumon Computational Intelligence and Games (eds. G. Kendall and S. Louis), pp. 187-194, Reno, USA.

    Bouzy, B. and Helmstetter, B. (2003). Monte-Carlo Go Developments. Proceedings of the 10th Advances in Computer Games conference (ACG-10), Graz 2003, Book edited by Kluwer, H. Jaap van den Herik, Hiroyuki Iida, Ernst A. Heinz, pp. 159-174.

    Brügmann, B. (1993). Monte Carlo Go. Not formally published.
    http://www.althofer.de/Bruegmann-MonteCarloGo.pdf.
    Retrieved at 2011-07-18 T10:16:45+08:00.

    Burmeister, J. and Wiles, J. (1995). The Challenge of Go as a Domain for AI Research: a Comparison Between Go and Chess. Proceedings of the Third Australian and New Zealand Conference on Intelligent Information System, pp. 181-186.

    Campbell, M., Hoane, A. J. and Hsu, F. H. (2002). Deep Blue. Artificial Intelligence, Vol. 134, issues 1-2, pp. 57-83.

    Chaslot, G., Fiter, C., Hoock, J.-B., Rimmel, A., and Teytaud, O. (2009). Adding Expert Knowledge and Exploration in Monte-Carlo Tree Search. Proceedings of the Twelfth International Advances in Computer Games Conference, pp. 1-13, Pamplona, Spain.

    Chaslot, G., J-B. Hoock, J.-B., Perez, J., Rimmel, A., Teytaud, O. and Winands, M. (2009). Meta Monte-Carlo Tree Search for Automatic Opening Book Generation. Proceedings of the IJCAI'09 Workshop on General Intelligence in Game Playing Agents, pp. 7-12.

    Chaslot, G., Winands, M., Bouzy, B., Uiterwijk, J. W. H. M., and Herik, H. J. van den (2007). Progressive Strategies for Monte-Carlo Tree Search. Proceedings of the 10th Joint Conference on Information Sciences (ed.P. Wang), pp. 655–661, Salt Lake City, USA.

    Chaslot, G., Winands, M. and Herik, H. J. van den (2008). Parallel Monte-Carlo Tree Search. Proceedings of the Conference on Computers and Games 2008, Vol. 5131 of Lecture Notes in Computer Science, pp. 60-71.

    Chen, K. (1989). Group identification in Computer Go. Heuristic Programming in Artificial Intelligence, Levy & Beal ( Eds.), pp. 195-210.

    Chen, K. and Chen, Z. (1999). Static analysis of life and death in the game of Go. Information Science, Vol. 121, pp. 113-134.

    Coquelin, P.-A. and Munos, R. (2007). Bandit Algorithm for Tree Search, Technical Report 6141, INRIA.

    Coulom, R. (2006). Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. Proceedings of the 5th International Conference on Computer and Games (eds. H. J. van den Herik, P. Ciancarini, and H. J. Donkers), Vol. 4630 of Lecture Notes in Computer Science, pp. 72-83, Springer, Turin, Italy.

    Coulom, R. (2007). Computing Elo Ratings of Move Patterns in the Game of Go. ICGA Journal, Vol. 30, No. 4, pp. 198-208.

    Coulom, R. (2010). Bayesian Elo Rating.
    http://remi.coulom.free.fr/Bayesian-Elo/#usage.
    Retrieved at 2011-07-17 T17:19:35+08:00.

    Drake, P. (2009). The Last-Good-Reply Policy for Monte-Carlo Go. ICGA Journal Vol. 32, pp. 221-227.

    Drake, P. (2011). Orego.
    http://legacy.lclark.edu/~drake/Orego.html.
    Retrieved at 2011-07-14 T14:43:55+08:00.

    Enzenberger, M. and Müller, M. (2009). A Lock-Free Multithreaded Monte-Carlo Tree Search Algorithm. In Advances in Computer Games (ACG12), pp. 14-20, Pamplona Espagne, Berlin: Springer.

    Enzenberger, M., Müller, M., Arneson B. and Segal R. (2010). Fuego - An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search. IEEE Transactions on Computational Intelligence and AI in Games, Vol. 2, No. 4, pp. 259-270. Special issue on Monte Carlo Techniques and Computer Go.

    Fotland, D. (2011). Exploration formulas for UCT.
    http://www.mail-archive.com/computer-go@dvandva.org/msg02143.html.
    Retrieved at 2011-07-17 T19:47:50+08:00.

    Fotland, D. (2010). ERICA wins Go 19x19 Tournament. ICGA Journal, Vol. 33, pp. 174-178.

    Fotland, D. (2002). Static Eye in "The Many Faces of Go". ICGA Journal, Vol. 25, No. 4, pp. 203-210.

    Fotland, D. (1996). World Computer Go Championships.
    http://www.smart-games.com/worldcompgo.html.
    Retrieved at 2011-07-17 T19:44:35+08:00.

    Fotland, D. (1993). Knowledge Representation in The Many Faces of Go.
    http://www.smart-games.com/knowpap.txt.
    Retrieved at 2011-07-17 T19:43:01+08:00.

    Gelly, S., Hoock, J.-B., Rimmel, Arpad, Teytaud, O. and Kalemkarian, Y. (2008). The Parallelization of Monte-Carlo Planning. In International Conference on Informatics in Control, Automation and Robot, Madeira, Portugal.

    Gelly, S. and Silver, D. (2007). Combining Online and Offline Knowledge in UCT. Proceedings of the 24th International Conference on Machine Learning, pp. 273-280, Corvallis Oregon USA.

    Gelly, S. and Silver, D. (2011). Monte-Carlo Tree Search and Rapid Action Value Estimation in Computer Go. Artificial Intelligence, Vol. 175, No. 11, pp. 1856-1875.

    Gelly, S., Wang, Y., Munos, R. and Teytaud, O. (2006). Modifications of UCT with Patterns in Monte-Carlo Go. Technical Report 6062, INRIA.

    Gaudel, R., Hoock, J.-B., Pérez, J., Sokolovska, N., Teytaud, O. (2010). A Principled Method for Exploiting Opening Books. Proceedings of the 5th International Conference on Computer and Games 2010, pp. 136-144.

    Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. Readings in computer vision: issues, problems, principles, and paradigms, pp. 564-584.

    Graepel, T., Goutrié, M., Krüger, M. and Herbrich R. (2001). Learning on Graphs in the Game of Go. Lecture Notes in Computer Science, Vol. 2130/2001, pp. 347-352.

    Herik, H. J. van den, Uiterwijk, J. W. H. M., Rijswijck, J. van. (2002). Games solved: Now and in the future. Artificial Intelligence, Vol. 134, pp. 277-311.

    Helmbold, D. P. and Wood, A. P. (2009). All-Moves-As-First Heuristics in Monte-Carlo Go. In Proceedings of the 2009 International Conference on Artificial Intelligence, ICAI’09/ISBN 1-60132-108-2, Editors: Arabnia, de la Fuente, Olivas, pp. 605-610, Los Vegas, USA.

    Hendrik, B. (2010). Adaptive Playout Policies for Monte-Carlo Go. Master thesis, Institut für Kognitionswissenschaft, Universität Osnabrück.

    Kocsis, L. and Szepesv´ari, C. (2006). Bandit Based Monte-Carlo Planning. In J. Furnkranz, T. Scheffer and M. Spiliopoulou (eds.), Machine Learning: ECML 2006, Lecture Notes in Artificial Intelligence 4212, pp. 282-293.

    Jasiek, R. (1997). Elementary Rules.
    http://home.snafu.de/jasiek/element.html.
    Retrieved at 2011-07-17 T19:42:20+08:00.

    Huang, S. C., and Yen, S. J. (2010). Many Faces of Go Wins Computer 13×13 Go Tournament. ICGA Journal, Vol. 33, No. 3, pp. 172-173.

    Hyatt, R. M. (1984). Using time wisely. ICCA Journal, Vol. 7, No. 1, pp. 4-9.

    Kato, H. (2008). More UCT / Monte-Carlo questions (Effect of rave). Computer-Go mailing list.
    http://www.mail-archive.com/computer-go@computer-go.org/msg07135.html.
    Retrieved at 2011-07-17 T19:48:49+08:00.

    Knuth, D. E., and Moore, R. W. (1975). An Analysis of Alpha-Beta Pruning, Artificial Intelligence, Vol. 6, No. 4, pp. 293-326.

    Lew, L. (2010). Library of effective Go routines.
    http://github.com/lukaszlew/libego.
    Retrieved at 2011-07-14 T14:27:45+08:00.

    Lichtenstein, D. and and Sipser, M. (1978). Go is PSPACE-hard. Foundations of Computer Science, pp. 48-54.

    Lin, C. H. (2009). Web2Go web site.
    http://web2go.board19.com/.
    Retrieved at 2011-07-26 T12:07:05+08:00.

    Markovitch, S. and Sella, Y. (1996). Learning of resource allocation strategies for game playing. Computational Intelligence, vol. 12, pp. 88-105.

    Müller, M. (2002). Computer Go. Artificial Intelligence, Vol. 134, issues 1-2, pp. 145-179.

    Persson, M. (2010). Valkyria. Sensei’s Library.
    http://senseis.xmp.net/?Valkyria.
    Retrieved at 2011-07-14 T14:33:12+08:00.

    Rimmel, A., Teytaud, F. and Teytaud, O. (2010). Biasing Monte-Carlo Simulations through RAVE Values. In Proceedings of the 5th International Conference on Computer and Games 2010, pp. 59-68.

    Rosin, C. D. (2010). Multi-armed bandits with episode context. Proceedings of ISAIM 2010.

    Schaeffer, J. (1989). The history heuristic and alpha-beta search enhancements in practice. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, pp. 1203-1212.

    Segal, R. (2010). On the Scalability of Parallel UCT. Proceedings of the 5th International Conference on Computer and Games 2010.

    Shannon, C. E. (1950). Programming a computer for playing chess. Philosophical Magazine, 7th series, Vol. 41, No. 314, pp. 256-275.

    Sheppard, B., Williams, M., Dailey, D., Hillis, D., Nentwich, D., Persson, M., Fotland, D. and Boon, M. (2009). Tweak to MCTS selection criterion. Computer-go mailing list,http://groups.google.com/group/computer-go-archive/browse_thread/thread/7531bef033ca31f6, Jun. 2009.

    Silver, D. (2009). Reinforcement learning and simulation-based search in computer Go. Ph.D. dissertation, University of Alberta.

    Slate, D. J., and Atkin, L. R. (1977). Chess 4.5 - The Northwestern University Chess Program, Chess Skill in Man and Machine, P.W. Frey (ed.), Springer-Verlag, New York, pp. 82-118.

    Smith, A. (1908). The game of Go, the national game of Japan.

    Šolak, R. and Vučković, R. (2009). Time management during a chess game. ICGA Journal, vol. 32, No. 4, pp. 206-220.

    Stern, D., Herbrich, R. and Graepel, T. (2006). Bayesian Pattern Ranking for Move Prediction in the Game of Go. In Proceedings of the International Conference of Machine Learning.

    Stogin, J., Chen, Y.-P., Drake, P., and Pellegrino, S. (2009). The Beta Distribution in the UCB Algorithm Applied to Monte-Carlo Go. In Proceedings of the 2009 International Conference on Artificial Intelligence, CSREA Press.

    Tesauro, G., Rajan, V. T., Segal, R. (2010). Bayesian Inference in Monte-Carlo Tree Search , In Proceedings of the Conference on Uncertainties in Artificial Intelligence 2010.

    Teytaud, O. (2011). News on Tromp-Cook.
    http://www.mail-archive.com/computer-go@dvandva.org/msg02171.html.
    Retrieved at 2011-07-17 T19:50:20+08:00.

    Tromp, J. and Farnebäck, G. (2006). Combinatorics of Go. Proceedings of 5th international Conference on Computer and Games, pp. 241-249.

    Xie, F., Liu, Z. (2009). Backpropagation Modification in Monte-Carlo Game Tree Search. Intelligent Information Technology Application, 2009 (IITA 2009), pp. 125-128.

    Yajima, T., Hashimoto, T., Matsui, T., Hashimoto, J. and Spoerer, K. (2010). Node Expansion Operators for the UCT Algorithm. Proceedings of the 5th International Conference on Computer and Games 2010, pp. 116-123.

    Yamashita, H. (2010). time settings. Computer-Go mailing list.
    http://dvandva.org/pipermail/computer-go/2010-July/000687.html.
    Retrieved at 2011-07-17 T19:50:56+08:00.

    Yamashita, H. (2011). Exploration formulas for UCT. Computer-Go mailing list.
    http://www.mail-archive.com/computer-go@dvandva.org/msg02155.html.
    Retrieved at 2011-07-17 T19:51:22+08:00.

    Yamato. (2011). News on Tromp-Cook.
    http://www.mail-archive.com/computer-go@dvandva.org/msg02184.html.
    Retrieved at 2011-07-17 T19:51:43+08:00.

    Yen, S. J. (1999). Design and Implementation of Computer Go Program Jimmy 5.0. Ph.D. dissertation, National Taiwan University.

    Yoshimoto, H., Yoshizoe, K., Kaneko, T., Kishimoto, A. and Taura, K. (2006). Monte Carlo Go Has a Way to Go. In Proceedings of the 21st National Conference on Artificial Intelligence, AAAI Press, pp. 1070-1075.

    Zobrist, A. (1969). A model of visual organization for the game of Go. Proceedings of AFIPS Spring Joint Computer Conference, Boston, AFIPS Press, Montvale, NJ, pp. 103-111.

    Zobrist, A. (1970). Feature extractions and representation for pattern recognition and the game of Go. Ph.D. Thesis, Graduate School of the University of Wisconsin, Madison, WI.

    下載圖示
    QR CODE