跳到主要內容

臺灣博碩士論文加值系統

(100.28.0.143) 您好!臺灣時間:2024/07/23 10:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:偕為昭
研究生(外文):Jie, Wei-Zhao
論文名稱:強化學習與遷移學習應用於六貫棋遊戲
論文名稱(外文):Investigating Reinforcement Learning and Transfer Learning in Hex Game
指導教授:林順喜林順喜引用關係
指導教授(外文):Lin, Shun-Shii
口試委員:吳毅成顏士淨陳志昌周信宏林順喜
口試委員(外文):Wu, I-ChenYen, Shi-JimChen, Jr-ChangChou, Hsin-HungLin, Shun-Shii
口試日期:2023-06-28
學位類別:碩士
校院名稱:國立臺灣師範大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2023
畢業學年度:111
語文別:中文
論文頁數:46
中文關鍵詞:六貫棋強化學習遷移學習
外文關鍵詞:HexReinforcement LearningTransfer Learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:46
  • 評分評分:
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
六貫棋是一款雙人對局遊戲,起初在1942年於丹麥的報紙中出現,被稱為Polygon。1948年時,被美國數學家John Forbes Nash Jr.重新獨立發明,並稱為Nash。最後在1952年由製造商Parker Brothers發行,且將其命名為Hex。在此遊戲中,上下及左右的對邊各以一個顏色表示,雙方玩家需要在棋盤上落子並將自己顏色的對邊連接以取得勝利。此遊戲為零和遊戲,且不會有平手的情況發生。在以前的研究中,六貫棋在9路以下的盤面已經被破解。
由於AlphaZero的問世,現今電腦對局遊戲的程式有更進一步的發展,以該方法研發的對局程式都有不錯的棋力。而在六貫棋遊戲中,不得不提由加拿大Alberta大學研發的Mohex程式,該程式一直都在競賽中得到優異的成績,至今也持續進行改良。
本研究試圖以AlphaZero的訓練框架進行強化學習,並以Mohex破解的盤面資料為輔助。在訓練大盤面的模型時需要較多的成本,因此嘗試結合遷移學習的方式,運用已經破解的小盤面資料,使初期的自我對下階段就能產生較好的棋譜,而不是從完全的零知識開始訓練,藉此提升大盤面模型的訓練成果。並且比較在進行遷移學習時,使用不同參數轉移方法的影響。
Hex is a two-player board game that first appeared in a Denmark newspaper in 1942 and was called Polygon. In 1948, American mathematician John Forbes Nash Jr. reinvented the game independently and called it Nash. Finally, in 1952, it was published by the manufacturer Parker Brothers and renamed Hex. In the game board, each of the opposite sides (vertically and horizontally) is represented by a different color. Players take turns placing their pieces on the board to connect opposite sides that marked by their colors to win. This game is a zero-sum game, and a tie is impossible. In previous research, the game has been solved for board sizes smaller than 9×9.
With the advent of AlphaZero, programs for board games have been further investigation, and programs developed using this method have also shown good performance. In the game of Hex, the program “Mohex” developed by the University of Alberta is noteworthy. It already had excellent results in competitions and is continuously improving its strength.
This thesis attempts to use the framework of AlphaZero for reinforcement learning and uses the solved board data from Mohex for assistance. Since training a model for larger board sizes require more resources, so we aim to combine transfer learning with solved games for smaller board sizes to get better gameplay in the early stages of self-play, rather than starting from zero knowledge. By the above approach, we try to improve the training results of the model for larger board sizes. Additionally, we compare the effects of using different ways to transfer parameters during transfer learning.
第一章 緒論 1
1.1 研究背景 1
1.2 研究目的 3

第二章 文獻探討 4
2.1 六貫棋遊戲策略 4
2.2 AlphaZero 5
2.3 遷移學習 7
2.4 Mohex 9
2.5 卷積神經網路 11
2.5.1 卷積層 11
2.5.2 池化層 12
2.5.3 全連接層 13
2.6 alpha-zero-general開源碼 15

第三章 方法與步驟 16
3.1 將六貫棋實作於alpha-zero-general 16
3.1.1 盤面設計與勝負判斷 16
3.1.2 對稱盤面 17
3.2 神經網路架構 18
3.3 藉由Mohex 產生訓練資料 19
3.3.1 將最佳走步轉換為訓練資料 21
3.3.2 沒有必勝走步的情況 22
3.3.3 將盤面進行翻轉得到更多訓練資料 23
3.4 原版AlphaZero的訓練 24
3.5 模型的預訓練及Layer transfer 25
3.6 Layer transfer時參數的對應與處理 26
3.6.1 將參數量不同的網路層直接進行初始化 27
3.6.2 將參數對應至相似的位置 27
3.7 將完成參數轉移的模型放入alpha-zero-general 32

第四章 實驗結果 33
4.1 實驗環境 33
4.2 將最佳解資訊轉為訓練資料的方法驗證 34
4.3 使用預訓練參數模型進行AlphaZero框架訓練 36
4.3.1 不使用所有預訓練參數版本與原版之比較 36
4.3.2 使用所有預訓練參數版本與原版之比較 38
4.3.3 方法一和方法二之比較 39
4.4 參數轉移時使用不同對應方式 42
4.5 與Mohex進行對戰 43
第五章 結論與未來方向 44
參考文獻 45
[1] DeepMind, https://www.deepmind.com/.
[2] Wikipedia: Hex, https://en.wikipedia.org/wiki/Hex_(board_game).
[3] Jakub Pawlewicz, Ryan Hayward, Philip Henderson, Broderick Arneson, “Stronger Virtual Connections in Hex”, IEEE Trans. on Computational Intelligence and AI in Games, vol. 7, no. 2, June 2015, pp. 156-166.
[4] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis, “Mastering the Game of Go without Human Knowledge”, Nature, vol. 550, Oct. 2017, pp. 354-359.
[5] Lisa Torrey, Jude Shavlik, “Transfer learning”, in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. Hershey, PA: IGI global, 2010, pp. 242-264.
[6] cgao3/benzene-vanilla-cmake, https://github.com/cgao3/benzene-vanilla-cmake.
[7] Broderick Arneson, Ryan B. Hayward, Philip Henderson, “Monte Carlo Tree Search in Hex”, IEEE Trans. on Computational Intelligence and AI in Games (special issue: Monte Carlo Techniques and Computer Go), vol. 2, no. 4, Dec. 2010, pp. 251-257.
[8] Broderick Arneson, Ryan B. Hayward, Philip Henderson, “Solving Hex: Beyond Humans”, Computers and Games, CG 2010, Lecture Notes in Computer Science, vol. 6515, Springer Berlin/Heidelberg, 2011, pp. 1-10. https://doi.org/10.1007/978-3-642-17928-0_1.
[9] Shih-Chieh Huang, Broderick Arneson, Ryan B. Hayward, Martin Müller, Jakub Pawlewicz, “MOHEX 2.0: A Pattern-Based MCTS Hex Player”, In: van den Herik, H., Iida, H., Plaat, A. (eds) Computers and Games. CG 2013. Lecture Notes in Computer Science, vol. 8427. Springer, Cham. https://doi.org/10.1007/978-3-319-09165-5_6.
[10] Ryan Hayward, Noah Weninger, “Hex 2017: MoHex Wins the 11x11 and 13x13 Tournaments”, ICGA Journal, vol. 39, no. 3-4, Jan. 2017, pp. 222-227.
[11] Yann LeCun, Leon Bottou, Yoshua Bengio,Patrick Haffner, “Gradient-Based Learning Applied to Document Recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, https://doi.org/10.1109/5.726791.
[12] suragnair/alpha-zero-general, https://github.com/suragnair/alpha-zero-general.
[13] Shantanu Thakoor, Surag Nair, Megha Jhunjhunwala, “Learning to Play Othello without Human Knowledge,” Stanford University CS238 Final Project Report, 2017.
[14] PyTorch, https://pytorch.org/.
[15] 王鈞平,六貫棋遊戲實作與強化學習應用,國立臺灣師範大學資訊工程所碩士論文,2019。
[16] Dennis J.N.J. Soemers, Vegard Mella, Eric Piette, Matthew Stephenson, Cameron Browne, Olivier Teytaud, “Transfer of Fully Convolutional Policy-Value Networks between Games and Game Variants,” arXiv preprint, https://arxiv.org/abs/2102.12375, 2021.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊