(3.231.230.175) 您好!臺灣時間:2021/04/16 02:26
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:江建德
研究生(外文):Jian-De Jiang
論文名稱:以非監督式方法利用知識庫與搜尋結果提升網頁搜尋排序一致性
論文名稱(外文):An Unsupervised Ranking Consistency Approach based onKnowledge Base and Search Results
指導教授:鄭卜壬鄭卜壬引用關係
指導教授(外文):Pu-Jen Cheng
口試委員:陳信希林守德蔡宗翰陳柏琳
口試委員(外文):Hsin-Hsi ChenShou-De LinTzong-Han TsaiBer-Lin Chen
口試日期:2016-07-27
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:英文
論文頁數:42
中文關鍵詞:網頁搜尋排序一致性查詢意圖非監督式方法知識庫主題分群查詢意圖模板
外文關鍵詞:Web SearchRanking ConsistencyQuery IntentUnsupervised ApproachKnowledge BaseTopical ClusterQuery Intent Template
相關次數:
  • 被引用被引用:0
  • 點閱點閱:128
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
對於網頁搜尋系統如知名搜尋引擎Google, Yahoo!與Bing,相關性排序是一個最重要的問題。相關性排序的傳統方法採用對於查詢分別進行最佳化的方式來增進效能。之前曾有一篇論文提出一個根據查詢意圖的相似性使用兩階段監督式學習,並藉由提升排序一致性來改善相關性排序。然而在該篇論文中有兩個問題需要被提出來解決。第一,他們使用學習排序需要使用大量的查詢紀錄,而如此大量的查詢紀錄只有成熟的搜尋引擎才會擁有,剛開始發展或發展中的搜尋系統必須仰賴非監督式方法來提升相關性排序。第二,該篇論文使用知識庫中的實體來代表查詢意圖。但由於查詢通常含有一些特定的資訊,所以實體並無法完全的表達查詢意圖。舉例來說:``Kobe Bryant family''表達的意圖是想了解Kobe Bryant的家人而非Kobe Bryant本人。
在這篇論文當中,我們提出一個藉由搜尋結果與知識庫的兩階段非監督式方法來改善排序一致性與相關性排序,解決不成熟的搜尋系統沒有查詢紀錄的問題。第一階段從搜尋結果擷取排序一致性的分數,並於第二階段藉由衡量獨特性與一致性的方式重新排序搜尋結果。此外,我們在查詢意圖加入查詢模板可以讓我們更清楚的解析查詢意圖。就我們所知,我們的論文是第一個使用非監督式排序一致性方法來改善相關性排序。最後,我們使用Freebase與Yahoo!的搜尋結果當作實驗資料庫並證實我們的方法,結果顯示出我們成功藉由非監督式方法改善了排序一致性與相關性排序的效能。

Relevance ranking is the most important problem in web search system, such as Google, Yahoo!, Bing etc. Most of conventional approaches focus on optimizing ranking model by each query separately. One past work propose a two-stage supervised approach to improve relevance ranking by enhancing ranking consistency across queries with similar search intents. However, there are two crucial problems of previous work. First, they use pair-wise learning to rank to learn consistency, and the method relies on large-scale query log which only few of mature web search systems have. Most of developing search engines need to improve their performance without query log. Second, they considers query intents on entities in knowledge base. Nevertheless, entities cannot completely represent query intents because queries contains some specific information to ask, such as ``Kobe Bryant family'' for the intents of family. In this work, we propose an two-phase unsupervised approach to improve ranking consistency by knowledge base and search results. The first phase extracts consistency from search results and the second phase re-ranks search results by leveraging consistency and unique. Furthermore, we add query templates to help us clarify query intents completely. For the best of our knowledge, our work is the first unsupervised method with ranking consistency to improve relevance ranking. We conducted extensive experiments using Freebase and search results from Yahoo! search engine, and results demonstrate that our approach improves ranking consistency and relevance ranking significantly.

Contents
誌謝iii
摘要v
Abstract vii
1 Introduction 1
2 Related Work 5
2.1 Ranking Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Federated web Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 URL Patterns of Web pages . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Search Intents behind Queries . . . . . . . . . . . . . . . . . . . . . . . 7
3 Problem Defination 9
3.1 Notations of Given Data . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Flow of Approach and Notations . . . . . . . . . . . . . . . . . . . . . . 11
4 Consistency Rank 13
4.1 Similar Query Intents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.1 Named Entity Recognizing Query . . . . . . . . . . . . . . . . . 14
4.1.2 Query Intent Template . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.3 Similar Query Intent Set . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Topical Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.1 URL Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.2 URL Sub-pattern . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Consistency Rank Extraction . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Re-rank Model 21
5.1 Merging Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 Rank Score Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.1 Ranking Properties . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.2 Reciprocal Rank Function . . . . . . . . . . . . . . . . . . . . . 23
5.3 Leveraging Unique and Consistency . . . . . . . . . . . . . . . . . . . . 25
5.3.1 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.2 Multiple Parameters . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.3 Optimization Based on Unique and Consistency . . . . . . . . . . 26
5.4 Re-Ranking Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.4.1 Consistency Features . . . . . . . . . . . . . . . . . . . . . . . . 27
5.4.2 Unique Features: . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Experiments 29
6.1 Datasets and Experimental Settings . . . . . . . . . . . . . . . . . . . . . 29
6.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.2.1 Evaluation of Ranking Consistency . . . . . . . . . . . . . . . . 30
6.2.2 Evaluation of Unsupervised Approach . . . . . . . . . . . . . . . 31
6.2.3 Evaluation with Different Parameters . . . . . . . . . . . . . . . 33
6.2.4 Re-ranking Feature Weights . . . . . . . . . . . . . . . . . . . . 34
7 Conclusions and Future Work 37
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Bibliography 39

[1] J. S. Beis and D. G. Lowe. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition, 1997.
[2] S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, O. Frieder, and N. Goharian. Fusion of effective retrieval strategies in the same information retrieval system. Journal of the American Society for Information Science and Technology, 55(10): 859–868, 2004.
[3] P. N. Bennett, R. W. White, W. Chu, S. T. Dumais, P. Bailey, F. Borisyuk, and X. Cui. Modeling the impact of short-and long-term behavior on search personalization. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pages 185–194, 2012.
[4] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250, 2008.
[5] M. R. Bouadjenek, H. Hacid, and M. Bouzeghoub. Sopra: A new social personalized ranking function for improving web search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 861–864, 2013.
[6] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning, pages 89–96, 2005.
[7] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, pages 129–136, 2007.
[8] Y. Chen, X. Li, A. Dick, and R. Hill. Ranking consistency for image matching and object retrieval. Pattern Recognition, 47(3):1349–1360, 2014.
[9] J. C. K. Cheung and X. Li. Sequence clustering and labeling for unsupervised query intent discovery. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 383–392, 2012.
[10] G. V. Cormack, C. L. Clarke, and S. Buettcher. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 758–759, 2009.
[11] E. A. Fox and J. A. Shaw. Combination of multiple searches. NIST SPECIAL PUBLICATION SP, pages 243–243, 1994.
[12] S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating implicit measures to improve web search. ACM Transactions on Information Systems (TOIS), 23(2):147–168, 2005.
[13] J. Guo, G. Xu, X. Cheng, and H. Li. Named entity recognition in query. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 267–274, 2009.
[14] J. Hu, G. Wang, F. Lochovsky, J.-t. Sun, and Z. Chen. Understanding user’s query intent with wikipedia. In Proceedings of the 18th international conference on World wide web, pages 471–480, 2009.
[15] J. Jiang, X. Song, N. Yu, and C.-Y. Lin. Focus: learning to crawl web forums. IEEE Transactions on knowledge and Data Engineering, 25(6):1293–1306, 2013.
[16] J.-Y. Jiang, J. Liu, C.-Y. Lin, and P.-J. Cheng. Improving ranking consistency for web search by leveraging a knowledge base and search logs. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 1441–1450, 2015.
[17] T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems (TOIS), 25(2):7, 2007.
[18] M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.
[19] A. Khudyak Kozorovitsky and O. Kurland. Cluster-based fusion of retrieved lists. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 893–902, 2011.
[20] H. S. Koppula, K. P. Leela, A. Agarwal, K. P. Chitrapura, S. Garg, and A. Sasturkar. Learning url patterns for webpage de-duplication. In Proceedings of the third ACM international conference on Web search and data mining, pages 381–390, 2010.
[21] Y. Li, B.-J. P. Hsu, and C. Zhai. Unsupervised identification of synonymous query intent templates for attribute intents. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 2029–2038, 2013.
[22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
[23] C. Quoc and V. Le. Learning to rank with nonsmooth cost functions. Proceedings of the Advances in Neural Information Processing Systems, 19:193–200, 2007.
[24] J. J. Rocchio. Relevance feedback in information retrieval. 1971.
[25] P. D. Turney, P. Pantel, et al. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37(1):141–188, 2010.
[26] H. Wang, X. He, M.-W. Chang, Y. Song, R. W. White, and W. Chu. Personalized ranking model adaptation for web search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 323–332, 2013.
[27] K. Wang, T. Walker, and Z. Zheng. Pskip: estimating relevance ranking quality from web search clickthrough data. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1355–1364, 2009.
[28] X. Yin, W. Tan, X. Li, and Y.-C. Tu. Automatic extraction of clickable structured web contents for name entity queries. In Proceedings of the 19th international conference on World wide web, pages 991–1000, 2010.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔