跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.172) 您好!臺灣時間:2025/02/10 02:20
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張博詞
研究生(外文):Po-Tzu Chang
論文名稱:將機率模型以及圖形隨機漫步理論應用在時序資料以改良網頁搜尋品質
論文名稱(外文):Combining probabilistic model with graph-based random walk to improve search quality through exploiting time-sensitive query information
指導教授:林守德林守德引用關係
指導教授(外文):Shou-De Lin
口試委員:陳信希鄭卜壬
口試日期:2011-07-14
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:英文
論文頁數:46
中文關鍵詞:時間敏感關鍵字搜尋引擎優化
外文關鍵詞:time sensitive queriessearch engine reranking
相關次數:
  • 被引用被引用:0
  • 點閱點閱:499
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
現今的搜尋引擎設備提供使用者輕易的搜尋,藉由輸入關鍵字,搜尋引擎會回相關的事物。但是關鍵字的意圖會隨著時間不同而相異,因此時間相關的資訊,可以提供給搜尋引擎對時間敏感關鍵字回傳的結果作優化。
在這篇論文當中,我們針對時間相關的資訊提出新的重新排序排名的方法,來增進搜尋結果的品質。我們主要是往兩個不同的資料面向去做優化:
1. 關鍵字具有時間相關的資訊的資料。
2. 關鍵字不具時間相關資訊的資料。
主要的方法是,我們將支援向量回歸加入時間相關的特徵去對搜尋結果最排序的優化。
在我的實驗結果中可以看到,在關鍵字具有時間相關資訊的資料中,使用時間相關的資訊,比起原本的排名,會得到10.28%左右的進步。而在關鍵字不具有時間相關資訊的資料中,會得到1.14%的進步。
在這篇論文的最後,我們針對我們由時間相關資訊所產生的特徵值做了分析,並比較之間的優缺點。


Search Engine services provide platforms for users to search their intent using query. The intent of query may vary in different time period. Time related information should be taking into consideration when search engine return search results.

In this paper, we present new re-ranking methods based on time information to improve search result quality. This paper aims at re-ranking search result depending on time sensitive information to improve the following situation:
1. Existed Queries dataset: URLs clicked by queries have sufficient time click information in training data.
2. Rare Queries dataset: URLs clicked by queries have on clicks information in training data and bad search results dataset.

We propose SVM Regression using time related features to effectively re-rank the search result of each query depending on click number in each time periods. And propose useful features generated from three methodologies on Existed Query dataset: (a) Probabilistic Prior, (b) Probabilistic Model using Language Model and KL-divergence, and (c) Page Rank approach based on Time click.
Besides, without click information on rare query dataset, we also propose features on rare queries dataset (a) Extract clicks from related query (b) Time based Page Rank. Then combine some features for SVM Regression to predict.
In my experiment results show that the proposed approach gains 10.28% improve over the original ranking in the AOL query log on Existed Query dataset. In rare query dataset, SVM Regression gains 1.14% improvement on Existed queries and 12.9% improvement on Non-Existed queries.
In the end, we analysis the improvement of each methods and discuss the pros and cons between these methods.


Acknowledgement I
摘要 II
Abstract III
Table of Contents V
List of Figures IX
List of Table XI
Introduction 1
1.1 Background 1
1.2 Motivation 2
1.3 Problem Definition 3
1.4 Proposed Solution 4
1.5 Contribution 4
1.6 Paper Organization: 5
Chapter 2 Related Works 6
2.1 How web search is strongly influenced by time? 6
2.2 Using time sensitive information to improve search result quality: 6
2.3 Query suggestion using time sensitive information: 7
Chapter 3 Methodology 9
3.1 System Overview 9
3.2 Feature Generation Methodologies when the URL document exists in training data (Existed Query) 11
3.2.1 Probabilistic Prior 11
3.2.2 Probabilistic Model with Language Model and KL-divergence 12
3.2.3 Page Rank and Time Based Page Rank 12
3.3 SVM Regression on Existed Query 17
3.4 Feature Extraction when the document does not exist in the training data (Rare Query) 18
3.5 SVM Regression on rare query 21
Chapter 4 Experiments 23
4.1 Data Sets 23
4.2 Re-Ranking Method 24
4.3 Evaluation Measures 25
4.4 Experiment results of Extract Features from Methodologies on Existed Query 26
4.4.1 Probabilistic prior result on Existed Query 26
4.4.2 Probabilistic Model using KL-divergence and Language Model similarity measure on Existed Query 27
4.4.3 Page Rank based improvement on Existed Query 29
4.5 SVM Regression improvement on Existed Query 30
4.6 SVM Regression improvement on rare query dataset: 32
4.7 Discussion 35
4.7.1 The performance between Probabilistic Prior and Probabilistic Model on Existed Query 35
4.7.2 Methodology improvement on original query dataset: 36
4.7.3 Methodology improvement on rare query dataset: 38
4.7.4 Dataset discussion on Existed Query: 39
4.7.5 Dataset discussion on Non-Existed Query: 40
Chapter 5 Conclusion and Future Work 43
5.1 Conclusion: 43
5.2 Future work: 44
Chapter 6 Reference: 45


[1]Elsas, J. and Dumais, S. T. Leveraging temporal dynamics of document content in relevance ranking. In Proc. of ACM WSDM Conference, 2010.
[2]Dong, A., Chang, Y., Zheng, Z., Mishne, G., Bai, J., Buchner, K., Zhang, R., Liao, C. and Diaz, F. Towards recency ranking in Web search. In Proc. of ACM WSDM, 2010.
[3]Kulkarni, A., Teevan, J., Svore, K., and Dumais, S. Understanding temporal query dynamics. In Proc. of WSDM, 2011.
[4]Li, X. and Croft, W. B. Time-based language models. In Proc. of ACM CIKM Conference, 2003.
[5]Pass, G., Chowdhury, A., and Torgeson, C. A picture of search. In Proc. of International Conference on Scalable Information Systems, 2006.
[6]Zhang, R., Chang, Y., Zheng, Z., Metzler, D. and Nie, J.-Y. Search result re-ranking by feedback control adjustment for time-sensitive query. In Proc. of NAACL, 2009.
[7]Alfonseca, E., Ciaramita, M. and Hall, K. Gazpacho and summer rash: Lexical relationships from temporal patterns of Web search queries. In Proceedings of EMNLP 2009, 1046-1055.
[8]Anagha K., Jaime T., Krysta M. S. and Susan T. D. Temporal Query Dynamics. In Proc. of ACM WSDM, 2011
[9]Beitzel, S. M., Jensen, E. C., Chowdhury, A., Grossman, D. and Frieder. Hourly analysis of a very large topically categorized Web query log. In Proceedings of SIGIR 2004, 321-328
[10]Qiankun Z., Sourav S B., Steven C. H. H., Michael R. L., Tie-Yan L., Wei-Ying M. Time-Dependent Semantic Similarity Measure of Queries Using Historical Click-Through Data. In Proc. of WWW, 2006
[11]Alias-i. 2008. LingPipe 4.1.0. http://alias-i.com/lingpipe (accessed October 1, 2008)
[12]Steve Chien, Nicole Immorlica. Semantic Similarity Between Search Engine Queries Using Temporal Correlation. In WWW ''05: Proceedings of the 14th international conference on World Wide Web (2005)
[13]R. Baraglia, C. Castillo, D. Donato, Fm, R. Perego. The Effects of Time on Query Flow Graph-based Models for Query Suggestion. In Proceedings of RIAO (28 April 2010)
[14]Mei, Q., Zhou, D., and Church, K. Query Suggestion Using Hitting Time. In CIKM ''08: Proceeding of the 17th ACM conference on Information and knowledge mining (New York, NY, USA, 2008), ACM, pp. 469-478.
[15]Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top