跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.41) 您好!臺灣時間:2026/01/13 16:55
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:阮昱勳
研究生(外文):Yu-Xun Ruan
論文名稱:迴歸於序數評等之研究
論文名稱(外文):Studies on Ordinal Ranking with Regression
指導教授:林軒田
指導教授(外文):Hsuan-Tien Lin
口試委員:徐宏民林守德蔡銘峰
口試日期:2011-07-13
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊網路與多媒體研究所
學門:電算機學門
學類:網路學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:英文
論文頁數:53
中文關鍵詞:序數評等排序問題迴歸分析成本資訊
外文關鍵詞:ordinal rankinglist-wise rankingcost-sensitiveregression
相關次數:
  • 被引用被引用:0
  • 點閱點閱:306
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
排序問題的相關研究在近年來漸趨熱門,在網路搜尋引擎、推薦系統等許多方面都有著廣泛的應用。而隨著網路的資訊量爆炸,如何藉由這些大量資料來快速地訓練出一個好的排序系統,便成為日益重要的問題。本篇論文著重於處理龐大的序數資料集,不同於傳統分類問題中的類別,在序數資料集內的類別之間帶有等級差距的資訊。本論文中研究如何於序數資料集上,利用迴歸分析去處理兩種類型的排序相關問題。其中一種排序問題是對資料做評等,排序系統會判斷此資料是屬於哪種序數等級;另外一種排序問題則是頂端排序問題,旨在研究如何讓與搜尋關鍵字相關的資料文件,能依其相關性高低排序產生一個搜尋列表。我們提出一個架構可以和各式迴歸相關演算法結合來處理排序問題,在此架構中導入了成本觀念去處理等級差距的資訊,而在兩個問題上的實驗結果都顯示了此架構可以有效提升排序結果。我們也針對在頂端排序問題上的一個熱門評量標準,進而提出了一個有效的成本函數,實驗結果則顯示此成本函數在此評量標準上更可有效提升排序效果。

Ranking is a popular research problem in recent years and has been used in wide range of applications including
web-search engines and recommendation systems.
In this thesis, we study on two ranking problems, the ordinal ranking problem and the top-rank problem.
We propose a novel ranking approach, cost-sensitive ordinal classification via regression (COCR), which respects the discrete nature of the ordinal ranks in real-world data sets. In particular, COCR applies a theoretically-sound reduction from ordinal to binary classification
and solves the binary classification sub-tasks with point-wise regression. Furthermore, COCR allows specifying mis-ranking costs to further boost the ranking performance.
We conduct experiments on two ranking problems respectively.
On the ordinal ranking problem, we compare different approaches based on decision trees.
The results show that the proposed COCR can perform better on many data sets when coupled with the appropriate cost.
Furthermore, on the top-rank problem, we derive the corresponding cost of a popular ranking criterion, Expected Reciprocal Rank (ERR), and plug the cost into the COCR approach. The resulting ERR-tuned COCR enjoys the benefits of both efficiency by using point-wise regression and top-rank prediction accuracy from the ERR criterion.
Evaluations on two large-scale data sets, including
``Yahoo! Learning to Rank Challenge'' and ``Microsoft Learning to Rank'', verify that some basic COCR settings outperform commonly-used regression approaches significantly. In addition, even better performance can often be achieved by the ERR-tuned COCR.

誌謝 i
中文摘要 iii
Abstract v
1 Introduction 1
1.1 Ranking Problem Statement...1
1.2 Preliminaries...3
1.3 Related Work...4
1.4 Contribution...7
2 Cost-sensitive Ordinal Classification via Regression 9
2.1 Reduction to Binary Classification...9
2.2 Replacing Binary Classification with Regression...12
2.3 Post-processing of Predictions...14
3 Studies on Ordinal Ranking Problem 17
3.1 Comparison Using Decision Trees...17
3.1.1 Data Sets and Experimental Setup...18
3.1.2 Experimental Results on Equal-value Data Sets...20
3.1.3 Experimental Results on Equal-frequency Data Sets...22
3.1.4 Experimental Results on Real-world Data Sets...22
3.1.5 Additional Comparison on Equal-value Data Sets...23
3.2 The Relationship between the Cost and the Evaluation Criteria...23
4 Studies on Top-rank Problem 33
4.1 Ranking List Criteria...33
4.1.1 NDCG: Normalized Discounted Cumulative Gain...33
4.1.2 ERR: Expected Reciprocal Rank... 34
4.2 An Error Bound on ERR...35
4.3 Optimistic ERR Cost...37
4.4 Experiments...39
4.4.1 Data sets...39
4.4.2 Base Regression Algorithms...41
4.4.3 Comparison Using Linear Regressio...41
4.4.4 Comparison Using M5P...44
4.4.5 Comparison Using Bagging-M5P...45
4.4.6 Comparison Using GBRT... 47
5 Conclusion 49
Bibliography 51

Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
Christopher J.C. Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole
Hamilton, and Greg Hullender. Learning to rank using gradient descent. In Proceedings
of ICML ’05, pages 89–96. ACM, 2005.
Christopher J.C. Burges, Robert Ragno, and Quoc V. Le. Learning to rank with nons-
mooth cost functions. In Advances in Neural Information Processing Systems (NIPS),
volume 18, pages 193–200. MIT Press, 2006.
Jaime S. Cardoso and Joaquim F. Pinto da Costa. Learning to classify ordinal data: The
data replication method. Journal of Machine Learning Research, 8:1393–1429, 2007.
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. Expected reciprocal
rank for graded relevance. In Proceedings of CIKM ’09, pages 621–630. ACM, 2009.
Wei Chu and Zoubin Ghahramani. Gaussian processes for ordinal regression. Journal of
Machine Learning Research, 6:1019–1041, 2005.
David Cossock and Tong Zhang. Subset ranking using regression. In Proceedings of
COLT ’06, pages 605–619. Springer, 2006.
Koby Crammer and Yoram Singer. Pranking with ranking. In Advances in Neural Infor-
mation Processing Systems (NIPS), volume 14, pages 641–647. MIT Press, 2002.
Andrew Frank and Arthur Asuncion. UCI machine learning repository, 2010. Download-
able at http://archive.ics.uci.edu/ml.
Eibe Frank and Mark Hall. A simple approach to ordinal classification. In Proceedings
of ECML ’01, pages 145–156. Springer, 2001.
Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. An efficient boosting
algorithm for combining preferences. Journal of Machine Learning Research, 4:933–
969, 2003.
Jerome H. Friedman. Greedy function approximation: A gradient boosting machine.
Annals of Statistics, 29:1189–1232, 2001.
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and
Ian H. Witten. The WEKA data mining software: An update. SIGKDD Explorations
Newsletter, 11(1):10–18, 2009.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical
Learning: Data Mining, Inference, and Prediction. Springer, 2003.
Kalervo J‥arvelin and Jaana Kek‥al‥ainen. Cumulated gain-based evaluation of IR tech-
niques. ACM Transactions on Information Systems, 20(4):422–446, 2002.
Thorsten Joachims. Optimizing search engines using clickthrough data. In Proceedings
of KDD ’02, pages 133–142. ACM, 2002.
Sotiris Kotsiantis and Panagiotis Pintelas. A cost sensitive technique for ordinal classifica-
tion problems. In George Vouros and Themistoklis Panayiotopoulos, editors, Methods
and Applications of Artificial Intelligence, volume 3025 of Lecture Notes in Computer
Science, pages 220–229. Springer Berlin / Heidelberg, 2004.
Stefan Kramer, Gerhard Widmer, Bernhard Pfahringer, and Michael De Groeve. Predic-
tion of ordinal classes using regression trees. Fundam. Inf., 47:1–13, 2001.
Ping Li, Christopher J.C. Burges, and Qiang Wu. McRank: Learning to rank using multi-
ple classification and gradient boosting. In Advances in Neural Information Processing
Systems (NIPS), volume 19. MIT Press, 2007.
Hsuan-Tien Lin and Ling Li. Reduction from cost-sensitive ordinal ranking to weighted
binary classification. Technical report, National Taiwan University, April 2011.
Tie-Yan Liu. Learning to rank for information retrieval. Foundations and Trends in
Information Retrieval, 3:225–331, 2009.
Yuanhua Lv, Taesup Moon, Pranam Kolari, Zhaohui Zheng, Xuanhui Wang, and
Yi Chang. Learning to model relatedness for news recommendation. In Proceedings of
WWW ’11, pages 57–66. ACM, 2011.
Alistair Moffat and Justin Zobel. Rank-biased precision for measurement of retrieval
effectiveness. ACM Transactions on Information Systems, 27, 2008.
Ananth Mohan, Zheng Chen, and Kilian Q. Weinberger. Web-search ranking with initial-
ized gradient boosted regression trees. Journal of Machine Learning Research Work-
shop and Conference Proceedings, 14:77–89, 2011.
Ross J. Quinlan. Learning with continuous classes. In Proceedings of IJCAI ’92, pages
343–348. World Scientific, 1992a.
Ross J. Quinlan. C4.5: Programs for Machine Learning (Morgan Kaufmann Series in
Machine Learning). Morgan Kaufmann, 1992b.
Matthew Richardson, Amit Prakash, and Eric Brill. Beyond PageRank: Machine learning
for static ranking. In Proceedings of WWW ’06, pages 707–715. ACM, 2006.
Hamed Valizadegan, Rong Jin, Ruofei Zhang, and Jianchang Mao. Learning to rank by
optimizing NDCG measure. In Proceedings of SIGIR ’00, pages 41–48. ACM, 2000.
Maksims N. Volkovs and Richard S. Zemel. BoltzRank: Learning to maximize expected
ranking gain. In Proceedings of ICML ’09, pages 1089–1096. ACM, 2009.
Yong Wang and Ian H. Witten. Induction of model trees for predicting continuous classes.
In Proceedings of ECML ’97, pages 128–137. Springer, 1997.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top