# 臺灣博碩士論文加值系統

(44.200.82.149) 您好！臺灣時間：2023/06/11 01:53

:::

### 詳目顯示

:

• 被引用:0
• 點閱:490
• 評分:
• 下載:0
• 書目收藏:0
 情境式拉霸問題 (Contextual Bandit Problem) 經常被使 用來模擬線上的應用,像是文章推薦系統。然而,我們 觀察到這些線上應用有部分的特性是傳統的情境式拉霸 問題無法模擬的,像是單回合多動作的設定。於是我們 提出一個新的多動作情境式拉霸問題 (Contextual Bandit with Multiple Actions) 來模擬這個特性。我們將一些現 有的方法調整後用在這個新問題上,同時我們也針對 新問題的特性提出了偶式回歸配合最高信心上界方法 (Pairwise Regression with Upper Confidence Bound). 實驗 的結果顯示我們提出的新方法表現的比現有的方法好。
 The contextual bandit problem is usually used to model online applications like article recommendation. Somehow the problem cannot fully meet some needs of these applica- tions, such as making multiple actions at the same time. We propose a new Contextual Bandit Problem with Multiple Ac- tions (CBMA), which is an extension of the traditional con- textual bandit problem and fits the online applications better. We adapt some existing contextual bandit algorithms for our CBMA problem, and propose a new Pairwise Regression with Upper Confidence Bound (PairUCB) algorithm which utilizes the new properties of the CBMA problem, The experiment re- sults demostrate that PairUCB outperforms other algorithms.
 Contents口試委員會審定書 iii 誌謝 v 摘要 vii Abstract ix 1 Introduction 12 Preliminary 52.1 ProblemSetup ...................... 5 2.2 RelatedWork....................... 63 Approaches 93.1 GeneralAlgorithmFramework ............. 93.2 BaselineApproach.................... 103.2.1 GreedyAlgorithm................ 103.2.2 StochasticAlgorithms.............. 123.2.3 Upper Confidence Bound Algorithm . . . . . . 133.3 ProposedApproach ................... 153.3.1 Pairwise Regression with Upper Confidence Bound 153.3.2 Mixed Pairwise and Pointwise Regression withUpper Confidence Bound . . . . . . . . . . . . 18 xi￼4 Experiment 214.1 Dataset .......................... 21 4.2 Setup ........................... 23 4.3 Performance ....................... 255 Conclusion 33Bibliography 35
 Bibliography[1] P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397–422, 2003.[2] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235–256, 2002.[3] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Com- puting, 32(1):48–77, 2002.[4] A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandit algorithms with supervised learning guarantees. arXiv preprint arXiv:1002.4058, 2010.[5] U. Brefeld and T. Scheffer. Auc maximizing support vector learn- ing. In Proceedings of the ICML 2005 workshop on ROC Analysis in Machine Learning, 2005.[6] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technol- ogy, 2:27:1–27:27, 2011. Software available at http://www.csie. ntu.edu.tw/~cjlin/libsvm.35￼[7] K.-C. Chou and H.-T. Lin. Balancing between estimated reward and uncertainty during news article recommendation for ICML 2012 exploration and exploitation challenge. 2012.[8] W. Chu, L. Li, L. Reyzin, and R. E. Schapire. Contextual ban- dits with linear payoff functions. In Proceedings of the Inter- national Conference on Artificial Intelligence and Statistics (AIS- TATS), 2011.[9] M. Dudik, D. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin, and T. Zhang. Efficient optimal learning for contex- tual bandits. arXiv preprint arXiv:1106.2369, 2011.[10] C. Gentile and F. Orabona. On multilabel classification and ranking with partial feedback. arXiv preprint arXiv:1207.0166, 2012.[11] T.-K. Jan, D.-W. Wang, C.-H. Lin, and H.-T. Lin. A simple method- ology for soft cost-sensitive classification. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge dis- covery and data mining, pages 141–149. ACM, 2012.[12] S. Kale, L. Reyzin, and R. Schapire. Non-stochastic bandit slate problems. Advances in Neural Information Processing Systems (NIPS), pages 1054–1062, 2010.[13] L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual- bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670. ACM, 2010.[14] L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evalua- tion of contextual-bandit-based news article recommendation algo- rithms. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 297–306. ACM, 2011.36￼[15] E. L. Mencia and J. Furnkranz. Pairwise learning of multilabel classifications with perceptrons. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IJCNN-08), pages 2899–2906. IEEE, 2008.[16] S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski. Ban- dits for taxonomies: A model-based approach. In SIAM on DATA MINING, 2007.[17] H. Robbins. Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers, pages 169–177. Springer, 1985.[18] D. Sculley. Combined regression and ranking. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 979–988. ACM, 2010.[19] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Intro- duction. MIT Press, 1998.[20] C.-C. Wang, S. R. Kulkarni, and H. V. Poor. Bandit problems with side observations. Automatic Control, IEEE Transactions on, 50(3):338–355, 2005.
 國圖紙本論文
 推文當script無法執行時可按︰推文 網路書籤當script無法執行時可按︰網路書籤 推薦當script無法執行時可按︰推薦 評分當script無法執行時可按︰評分 引用網址當script無法執行時可按︰引用網址 轉寄當script無法執行時可按︰轉寄

 1 機器學習於合約橋牌叫牌上之應用 2 以機器學習方法進行互動驗證

 無相關期刊

 1 以虛擬回饋演算法解決線性情境式拉霸問題 2 機器學習於合約橋牌叫牌上之應用 3 以機器學習方法進行互動驗證 4 一基於偏好學習排名的分治方法 5 邁向實際的線上學習 6 發展奈米金結合傳統化療藥物以抑制口腔鱗狀細胞癌之協同療法 7 分散式定位於智慧家庭系統中的實作及應用 8 以主動採樣對和點解決大型線性二分排行 9 偏氟乙烯和三氟乙烯的共聚物在場效電晶體型記憶體元件上之應用 10 管鼻蝠與彩蝠偵測及撿拾獵物的行為 11 含雙縮脲及離子基團之紫質衍生物的設計與合成及其自組裝奈米微結構之鑑定 12 樹狀抽樣式標註成本導向主動學習演算法 13 以情境吃角子老虎機演算法推薦股票投資行為的研究 14 接種B型肝炎疫苗兒童發生B型肝炎病毒感染之研究 15 高功率發光二極體於熱環境下之壽命分佈與可靠度分析

 簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室