(34.201.11.222) 您好!臺灣時間:2021/02/25 05:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:何君彥
研究生(外文):Chun-Yen Ho
論文名稱:機器學習於合約橋牌叫牌上之應用
論文名稱(外文):Contract Bridge Bidding by Learning
指導教授:林軒田
指導教授(外文):Hsaun-Tien Lin
口試委員:李育杰林守德
口試委員(外文):Yuh-Jye LeeShou-De Lin
口試日期:2014-07-04
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2014
畢業學年度:102
語文別:英文
論文頁數:28
中文關鍵詞:機器學習合約橋牌情境式拉霸問題信心值上界成本導向分類器
外文關鍵詞:Machine LearningContract BridgeContextual Bandit ProblemUpper-Confidence BoundCost-Sensitive Classification
相關次數:
  • 被引用被引用:0
  • 點閱點閱:462
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
合約橋牌是一種具有不完全資訊特性的遊戲,電腦在此遊戲中通常無法勝過人類的橋牌專家。其中,人類橋牌玩家的叫牌決定對於電腦程式而言特別難以模仿,這使得自動化叫牌仍然是一個具挑戰性的研究問題。另一方面,使用不模仿人類玩家的方法進行自動化叫牌的可能性目前尚未被充份研究,在這篇論文中,我們在無競叫叫牌問題上首先探討使用此種方法的可能性。我們提出一個獨創的機器學習架構以使電腦程式學習自己的叫牌決定。在這個架構下,我們將叫牌問題轉換為機器學習問題,並精心設計一個基於成本導向分類器和信心值上界演算法的模型以解決此問題。我們以實驗驗證所提出的模型,並發現此模型與模仿人類玩家叫牌決定且多次贏得冠軍的電腦橋牌程式相較具有相當的競爭力。

Contract bridge is an example of an incomplete information game for which computers typically do not perform better than expert human bridge players. In particular, the typical bidding decisions of human bridge players are difficult to mimic with a computer program, and thus automatic bridge bidding remains to be a challenging research problem. Currently, the possibility of automatic bidding without mimicking human players has not been fully studied. In this work, we take an initiative to study such possibility for the specific problem of bidding without competition. We propose a novel learning framework to let a computer program learn its own bidding decisions. The framework transforms the bidding problem into a learning problem, and then solves the problem with a carefully designed model that consists of costsensitive classifiers and upper-confidence-bound algorithms. We validate the proposed model and find that it performs competitively to the champion computer bridge program that mimics human bidding decisions.

口試委員會審定書 i
致謝ii
中文摘要 iii
Abstract iv
Contents v
List of Figures vii
List of Tables viii
1 Introduction 1
2 Problem Setup 5
3 Proposed Model 8
3.1 Baseline and Optimistic Methods . . . . . . . . . . . . . . . . . . . . . . 8
3.2 The Difficulty of Learning to Bid . . . . . . . . . . . . . . . . . . . . . . 9
3.3 The Multi-Layer Bandit Model . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Additional Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Experiments 16
4.1 Baseline and Optimistic Methods . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Effect of Applying Techniques . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Comparison on Different Model Structures . . . . . . . . . . . . . . . . 20
4.4 Comparison with Wbridge5 . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5 Effect of Using Real Declarer . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Conclusions and Future Works 23
A Table of Opening Bids 24

[1] Tuomas Sandholm. The state of solving large incomplete-information games, and
application to poker. AI Magazine, 31(4):13–32, 2010.
[2] Michael Bowling, Johannes F&;uuml;rnkranz, Thore Graepel, and Ron Musick. Machine
learning and games. Machine learning, 63(3):211–215, 2006.
[3] Marc JV Ponsen, Jan Ramon, Tom Croonenborghs, Kurt Driessens, and Karl Tuyls.
Bayes-relational learning of opponent models from incomplete information in nolimit
poker. In Proceedings of the AAAI Conference on Artificial Intelligence, pages
1485–1486, 2008.
[4] Lu&;iacute;s Filipe Te&;oacute;filo, Nuno Passos, Lu&;iacute;s Paulo Reis, and Henrique Lopes Cardoso.
Adapting strategies to opponent models in incomplete information games: a reinforcement
learning approach for poker. In Autonomous and Intelligent Systems,
pages 220–227. 2012.
[5] Matthew L Ginsberg. Gib: Imperfect information in a computationally challenging
game. Journal of Artificial Intelligence Research, 14:303–358, 2001.
[6] Matthew L Ginsberg. Gib: Steps toward an expert-level bridge-playing program. In
Proceedings of International Joint Conference on Artificial Intelligence, pages 584–
593, 1999.
[7] Takahisa Ando and Takao Uehara. Reasoning by agents in computer bridge bidding.
In Computers and Games, pages 346–364. 2001.
[8] Asaf Amit and Shaul Markovitch. Learning to bid in bridge. Machine Learning,
63(3):287–327, 2006.
[9] Lori L DeLooze and James Downey. Bridge bidding with imperfect information. In
IEEE Symposium on Computational Intelligence and Games, pages 368–373. IEEE,
2007.
[10] Ming-Sheng Chang. Building a fast double-dummy bridge solver. 1996.
[11] Alina Beygelzimer, Varsha Dani, Tom Hayes, John Langford, and Bianca Zadrozny.
Error limiting reductions between classification tasks. In Proceedings of the 22nd
International Conference on Machine Learning, pages 49–56, 2005.
[12] Zhi-Hua Zhou and Xu-Ying Liu. On multi-class cost-sensitive learning. Computational
Intelligence, 26(3):232–257, 2010.
[13] Han-Hsing Tu and Hsuan-Tien Lin. One-sided support vector regression for multiclass
cost-sensitive classification. In Proceedings of the 27th International Conference
on Machine Learning, pages 1095–1102, 2010.
[14] Wei Li, Xuerui Wang, Ruofei Zhang, Ying Cui, Jianchang Mao, and Rong Jin. Exploitation
and exploration in a performance based contextual advertising system. In
Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, pages 27–36, 2010.
[15] Wei Chu, Lihong Li, Lev Reyzin, and Robert E Schapire. Contextual bandits with
linear payoff functions. In International Conference on Artificial Intelligence and
Statistics, pages 208–214, 2011.
[16] John Langford and Tong Zhang. The epoch-greedy algorithm for contextual multiarmed
bandits. Advances in Neural Information Processing Systems, 20:1096–1103,
2007.
[17] Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed
bandit problem. Machine learning, 47(2-3):235–256, 2002.
[18] John Langford and Tong Zhang. The epoch-greedy algorithm for multi-armed bandits
with side information. In Advances in Neural Information Processing Systems
20, pages 817–824, 2008.
[19] Yves Costel. Wbridge5 bridge software, 2014. URL: http://www.wbridge5.
com/.
[20] Introduction to bridge scoring, 2005. URL: http://www.acbl.org/learn/
scoreTeams.html.
[21] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines.
ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27,
2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/
libsvm.
[22] Standard american, 2014. URL: http://en.wikipedia.org/wiki/
Standard_American.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔