跳到主要內容

臺灣博碩士論文加值系統

(44.192.115.114) 您好!臺灣時間:2023/09/29 12:05
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:沈暐原
研究生(外文):Wei-Yuan Shen
論文名稱:以主動採樣對和點解決大型線性二分排行
論文名稱(外文):Active Sampling of Pairs and Points for Large-scale Linear Bipartite Ranking
指導教授:林軒田
指導教授(外文):Hsuan-Tien Lin
口試委員:林守德李育杰
口試委員(外文):Shou-De LinYuh-Jye Lee
口試日期:2013-07-09
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:33
中文關鍵詞:機器學習二分排行二元分類大型資料主動學習
外文關鍵詞:bipartite rankingbinary classificationlarge-scaleactive learningAUC
相關次數:
  • 被引用被引用:0
  • 點閱點閱:182
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
二分排行(Bipartite Ranking) 是一種在機器學習中基礎的排行 (Ranking) 問題,其目的在從輸入資料中學習如何將相關的樣品正確的排在非相關樣品之前。對式 (Pair-wise) 解法是其中一種解決二分排行的主要方式,它將二分排行問題轉變成在樣品「對」的二元分類 (Binary Classification) 問題,以學習在一對樣品中何者該排在另一物之前的模型來解決二分排行問題。然而,這類方法通常不適用於大型的輸入資料,因為「對」的數目往往是輸入資料大小的平方,過量的「對」會造成計算資源不足而無法解決問題。另一方面,點式(Point-wise) 也是一種解決二分排行的常見方式,它以樣品「點」的二元分類問題來近似二分排行問題,在輸入資料中學習樣品「點」是否相關。因為「點」的數量往往遠小於「對」的數量,使得這類方法可以用於大型資料上,但是可能會得到較低的正確率。綜合以上討論,我們了解要正確且有效率的解決大型二分排行問題是一件困難的工作。因此,在這篇論文中,我們提出了結合二分排行與二元分類的架構 (Combined Ranking and Classification) 以正確得解決二分排行問題。這個架構利用了將「點」視為一種「虛對」的想法,融合了對式與點式的二分排行方法。除此之外,為了有效率的解決大型二分排行問題,我們在 CRC 的架構下設計了主動採樣(Active Sampling) 演算法。此方法設計的想法來自於機器學習中的主動學習 (Active Learning) 問題,這個採樣法讓我們在大量的樣品「對」中只利用少量的樣品「對」有效率的達到不錯的正確率。最後,在14 個現實大型資料集中,實驗結果顯示我們所提出的主動採樣對和點演算法搭配上線性支持向量機 (SVM) 可以有效率的解決大型二分排行問題,且通常達到比目前許多先進的二分排行演算法還要高的準確性。

Bipartite ranking is a fundamental ranking problem that learns to order relevant instances ahead of irrelevant ones. One major approach for bipartite ranking, called the pair-wise approach, tackles an equivalent binary classification problem of whether one instance out of a pair of instances should be ranked higher than the other. Nevertheless, the number of instance pairs constructed from the input data could be quadratic to the size of the input data, which makes pair-wise ranking generally infeasible on large-scale data sets. Another major approach for bipartite ranking, called the point-wise approach, directly solves a binary classification problem between relevant and irrelevant instance points. This approach is feasible for large-scale data sets, but the resulting ranking performance can be inferior. That is, it is difficult to conduct bipartite ranking accurately and efficiently at the same time. In this thesis, we propose a general Combined Ranking and Classification (CRC) framework to accurately conduct bipartite ranking. The framework unifies point-wise and pair-wise approaches and is simply based on the idea of treating each instance point as a pseudo-pair. Moreover, we develop a novel scheme within the framework to conduct bipartite ranking efficiently. The scheme, called Active Sampling, is inspired from the rich field of active learning and can reach a competitive ranking performance while focusing only on a small subset of the many pairs during training. Experiments on 14 real-word large-scale data sets demonstrate that the proposed algorithm of Active Sampling within CRC, when coupled with a linear Support Vector Machine, usually outperforms state-of-the-art point-wise and pair-wise ranking approaches in terms of both accuracy and efficiency.

誌謝iii
摘要v
Abstract vii
1 Introduction 1
2 Setup and Related Works 5
3 Bipartite Ranking with Active Querying 9
3.1 Combined Ranking and Classification . . . . . . . . . . . . . . . . . . . 9
3.2 Pool-based Active Learning . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Active Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Sampling Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Experiments 15
4.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Experiment Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Performance Comparison and Robustness . . . . . . . . . . . . . . . . . 16
4.3.1 The Usefulness of Bias Correction for Soft Version Samplings . . 17
4.3.2 Soft Version Samplings . . . . . . . . . . . . . . . . . . . . . . . 17
4.3.3 Hard Version Samplings . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Efficiency Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5 The Usefulness of Larger Budget . . . . . . . . . . . . . . . . . . . . . . 20
4.6 The Usefulness of the CRC Framework . . . . . . . . . . . . . . . . . . 21
4.7 Experiment Result Summary . . . . . . . . . . . . . . . . . . . . . . . . 23
5 Conclusion 29
Bibliography 31

Bibliography
[1] Shivani Agarwal and Dan Roth. A study of the bipartite ranking problem in machine
learning. University of Illinois at Urbana-Champaign, 2005.
[2] Nir Ailon. An active learning algorithm for ranking from pairwise preferences with
an almost optimal query complexity. The Journal of Machine Learning Research,
13:137–164, 2012.
[3] Nir Ailon and Mehryar Mohri. An efficient reduction of ranking to classification.
arXiv preprint arXiv:0710.2889, 2007.
[4] Maria-Florina Balcan, Nikhil Bansal, Alina Beygelzimer, Don Coppersmith, John
Langford, and Gregory B Sorkin. Robust reductions from ranking to classification.
Machine learning, 72(1-2):139–153, 2008.
[5] Eric B Baum and Frank Wilczek. Supervised learning of probability distributions by
neural networks. In Neural Information Processing Systems, pages 52–61, 1988.
[6] Ulf Brefeld and Tobias Scheffer. AUC maximizing support vector learning. In International
Conference on Machine learning Workshop on ROC Analysis in Machine
Learning, 2005.
[7] Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton,
and Greg Hullender. Learning to rank using gradient descent. In International
Conference on Machine learning, pages 89–96, 2005.
[8] Christopher JC Burges, Quoc Viet Le, and Robert Ragno. Learning to rank with
nonsmooth cost functions. Neural Information Processing Systems, 19:193–200,
2007.
[9] Rich Caruana, Thorsten Joachims, and Lars Backstrom. Kdd-cup 2004: results and
analysis. ACM SIGKDD Explorations Newsletter, 6(2):95–108, 2004.
[10] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines.
ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27,
2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[11] Stéphan Clemençon, Gabor Lugosi, and Nicolas Vayatis. Ranking and empirical
minimization of U-statistics. The Annals of Statistics, 36(2):844–874, 2008.
[12] Corinna Cortes and Mehryar Mohri. AUC optimization vs. error rate minimization.
Neural Information Processing Systems, 16(16):313–320, 2004.
[13] Pinar Donmez and Jaime G Carbonell. Optimizing estimated loss reduction for active
sampling in rank learning. In International Conference on Machine learning, pages
248–255, 2008.
[14] Pinar Donmez and Jaime G Carbonell. Active sampling for rank learning via optimizing
the area under the roc curve. Advances in Information Retrieval, pages
78–89, 2009.
[15] John Duchi, Lester Mackey, and Michael I Jordan. On the consistency of ranking
algorithms. In International Conference on Machine learning, pages 327–334, 2010.
[16] Şeyda Ertekin and Cynthia Rudin. On equivalence relationships between classification
and ranking algorithms. The Journal of Machine Learning Research, 12:2905–
2929, 2011.
[17] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin.
LIBLINEAR: A library for large linear classification. The Journal of Machine Learning
Research, 9:1871–1874, 2008.
[18] Tom Fawcett. An introduction to ROC analysis. Pattern Recognition Letters,
27(8):861–874, 2006.
[19] Yoav Freund, Raj Iyer, Robert E Schapire, and Yoram Singer. An efficient boosting
algorithm for combining preferences. The Journal of Machine Learning Research,
4:933–969, 2003.
[20] Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line
learning and an application to boosting. Journal of Computer and System Sciences,
55(1):119–139, 1997.
[21] Glenn Fung, Romer Rosales, and Balaji Krishnapuram. Learning rankings via convex
hull separation. Neural Information Processing Systems, 18:395, 2006.
[22] Johannes Fürnkranz, Eyke Hüllermeier, Eneldo Loza Mencía, and Klaus Brinker.
Multilabel classification via calibrated label ranking. Machine Learning, 73(2):133–
153, 2008.
[23] Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin rank boundaries
for ordinal regression. Neural Information Processing Systems, pages 115–132,
1999.
[24] Daniel G Horvitz and Donovan J Thompson. A generalization of sampling without
replacement from a finite universe. Journal of the American Statistical Association,
47(260):663–685, 1952.
[25] Thorsten Joachims. Training linear svms in linear time. In ACM SIGKDD Conference
on Knowledge Discovery and Data Mining, pages 217–226, 2006.
[26] Moshe Lichman Kevin Bache. UCI machine learning repository, 2013.
[27] Wojciech Kotłlowski, Krzysztof Dembczynski, and Eyke Hüllermeier. Bipartite
ranking through minimization of univariate loss. In International Conference on
Machine learning, pages 1113–1120, 2011.
[28] David D Lewis and William A Gale. A sequential algorithm for training text classifiers.
In ACM SIGIR Conference on Research and Development in Information
Retrieval, pages 3–12, 1994.
[29] Tie-Yan Liu. Learning to rank for information retrieval. Foundations and Trends in
Information Retrieval, 3(3):225–331, 2009.
[30] Shyamsundar Rajaram, Charlie K Dagli, Nemanja Petrovic, and Thomas S Huang.
Diverse active ranking for multimedia search. In Computer Vision and Pattern
Recognition, pages 1–8, 2007.
[31] Nicholas Roy and Andrew McCallum. Toward optimal active learning through
monte carlo estimation of error reduction. In International Conference on Machine
learning, pages 441–448, 2001.
[32] Cynthia Rudin and Robert E Schapire. Margin-based ranking and an equivalence
between AdaBoost and RankBoost. The Journal of Machine Learning Research,
10:2193–2232, 2009.
[33] D Sculley. Combined regression and ranking. In ACM SIGKDD Conference on
Knowledge Discovery and Data Mining, pages 979–988, 2010.
[34] Burr Settles. Active learning literature survey. University of Wisconsin, Madison,
2010.
[35] Burr Settles, Mark Craven, and Soumya Ray. Multiple-instance active learning. In
Neural Information Processing Systems, 2008.
[36] Harald Steck. Hinge rank loss and the area under the ROC curve. European Conference
on Machine Learning, pages 347–358, 2007.
[37] Grigorios Tsoumakas and Ioannis Katakis. Multi-label classification: An overview.
International Journal of Data Warehousing and Mining, 3(3):1–13, 2007.
[38] Vladimir Vapnik. The nature of statistical learning theory. Springer, 1999.
[39] Kuan-Wei Wu, Chun-Sung Ferng, Chia-Hua Ho, An-Chun Liang, Chun-Heng
Huang, Wei-Yuan Shen, Jyun-Yu Jiang, Ming-Hao Yang, Ting-Wei Lin, Ching-Pei
Lee, et al. A two-stage ensemble of diverse models for advertisement ranking in
KDD Cup 2012. In ACM SIGKDD KDD-Cup WorkShop, 2012.
[40] Hwanjo Yu. Svm selective sampling for ranking with application to data retrieval. In
ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 354–
363, 2005.
[41] Guo-Xun Yuan, Chia-Hua Ho, and Chih-Jen Lin. Recent advances of large-scale
linear classification. Proceedings of the IEEE, 100(9):2584–2603, 2012.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top