跳到主要內容

臺灣博碩士論文加值系統

(44.200.82.149) 您好!臺灣時間:2023/06/11 01:53
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張雅軒
研究生(外文):Ya-Hsuan Chang
論文名稱:多動作情境式拉霸問題之研究
論文名稱(外文):Study on Contextual Bandit Problem with Multiple Actions
指導教授:林軒田
指導教授(外文):Hsuan-Tien Lin
口試委員:林守德李育杰
口試委員(外文):Shou-de LinYuh-Jye Lee
口試日期:2013-07-09
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:37
中文關鍵詞:機器學習情境式拉霸問題信心值上界
外文關鍵詞:Machine LearningContextual Bandit ProblemUpper Confidence Bound
相關次數:
  • 被引用被引用:0
  • 點閱點閱:490
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
情境式拉霸問題 (Contextual Bandit Problem) 經常被使 用來模擬線上的應用,像是文章推薦系統。然而,我們 觀察到這些線上應用有部分的特性是傳統的情境式拉霸 問題無法模擬的,像是單回合多動作的設定。於是我們 提出一個新的多動作情境式拉霸問題 (Contextual Bandit with Multiple Actions) 來模擬這個特性。我們將一些現 有的方法調整後用在這個新問題上,同時我們也針對 新問題的特性提出了偶式回歸配合最高信心上界方法 (Pairwise Regression with Upper Confidence Bound). 實驗 的結果顯示我們提出的新方法表現的比現有的方法好。

The contextual bandit problem is usually used to model online applications like article recommendation. Somehow the problem cannot fully meet some needs of these applica- tions, such as making multiple actions at the same time. We propose a new Contextual Bandit Problem with Multiple Ac- tions (CBMA), which is an extension of the traditional con- textual bandit problem and fits the online applications better. We adapt some existing contextual bandit algorithms for our CBMA problem, and propose a new Pairwise Regression with Upper Confidence Bound (PairUCB) algorithm which utilizes the new properties of the CBMA problem, The experiment re- sults demostrate that PairUCB outperforms other algorithms.

Contents
口試委員會審定書 iii 誌謝 v 摘要 vii Abstract ix 1 Introduction 1
2 Preliminary 5
2.1 ProblemSetup ...................... 5 2.2 RelatedWork....................... 6
3 Approaches 9
3.1 GeneralAlgorithmFramework ............. 9
3.2 BaselineApproach.................... 10
3.2.1 GreedyAlgorithm................ 10
3.2.2 StochasticAlgorithms.............. 12
3.2.3 Upper Confidence Bound Algorithm . . . . . . 13
3.3 ProposedApproach ................... 15
3.3.1 Pairwise Regression with Upper Confidence Bound 15
3.3.2 Mixed Pairwise and Pointwise Regression with
Upper Confidence Bound . . . . . . . . . . . . 18 xi
4 Experiment 21
4.1 Dataset .......................... 21 4.2 Setup ........................... 23 4.3 Performance ....................... 25
5 Conclusion 33
Bibliography 35

Bibliography
[1] P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397–422, 2003.
[2] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235–256, 2002.
[3] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Com- puting, 32(1):48–77, 2002.
[4] A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandit algorithms with supervised learning guarantees. arXiv preprint arXiv:1002.4058, 2010.
[5] U. Brefeld and T. Scheffer. Auc maximizing support vector learn- ing. In Proceedings of the ICML 2005 workshop on ROC Analysis in Machine Learning, 2005.
[6] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technol- ogy, 2:27:1–27:27, 2011. Software available at http://www.csie. ntu.edu.tw/~cjlin/libsvm.
35
[7] K.-C. Chou and H.-T. Lin. Balancing between estimated reward and uncertainty during news article recommendation for ICML 2012 exploration and exploitation challenge. 2012.
[8] W. Chu, L. Li, L. Reyzin, and R. E. Schapire. Contextual ban- dits with linear payoff functions. In Proceedings of the Inter- national Conference on Artificial Intelligence and Statistics (AIS- TATS), 2011.
[9] M. Dudik, D. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin, and T. Zhang. Efficient optimal learning for contex- tual bandits. arXiv preprint arXiv:1106.2369, 2011.
[10] C. Gentile and F. Orabona. On multilabel classification and ranking with partial feedback. arXiv preprint arXiv:1207.0166, 2012.
[11] T.-K. Jan, D.-W. Wang, C.-H. Lin, and H.-T. Lin. A simple method- ology for soft cost-sensitive classification. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge dis- covery and data mining, pages 141–149. ACM, 2012.
[12] S. Kale, L. Reyzin, and R. Schapire. Non-stochastic bandit slate problems. Advances in Neural Information Processing Systems (NIPS), pages 1054–1062, 2010.
[13] L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual- bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670. ACM, 2010.
[14] L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evalua- tion of contextual-bandit-based news article recommendation algo- rithms. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 297–306. ACM, 2011.
36
[15] E. L. Mencia and J. Furnkranz. Pairwise learning of multilabel classifications with perceptrons. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IJCNN-08), pages 2899–2906. IEEE, 2008.
[16] S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski. Ban- dits for taxonomies: A model-based approach. In SIAM on DATA MINING, 2007.
[17] H. Robbins. Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers, pages 169–177. Springer, 1985.
[18] D. Sculley. Combined regression and ranking. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 979–988. ACM, 2010.
[19] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Intro- duction. MIT Press, 1998.
[20] C.-C. Wang, S. R. Kulkarni, and H. V. Poor. Bandit problems with side observations. Automatic Control, IEEE Transactions on, 50(3):338–355, 2005.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top