研究生(外文):Po-Tzu Chang
論文名稱(外文):Combining probabilistic model with graph-based random walk to improve search quality through exploiting time-sensitive query information
指導教授(外文):Shou-De Lin
外文關鍵詞:time sensitive queriessearch engine reranking
1. 關鍵字具有時間相關的資訊的資料。
2. 關鍵字不具時間相關資訊的資料。

Search Engine services provide platforms for users to search their intent using query. The intent of query may vary in different time period. Time related information should be taking into consideration when search engine return search results.

In this paper, we present new re-ranking methods based on time information to improve search result quality. This paper aims at re-ranking search result depending on time sensitive information to improve the following situation:
1. Existed Queries dataset: URLs clicked by queries have sufficient time click information in training data.
2. Rare Queries dataset: URLs clicked by queries have on clicks information in training data and bad search results dataset.

We propose SVM Regression using time related features to effectively re-rank the search result of each query depending on click number in each time periods. And propose useful features generated from three methodologies on Existed Query dataset: (a) Probabilistic Prior, (b) Probabilistic Model using Language Model and KL-divergence, and (c) Page Rank approach based on Time click.
Besides, without click information on rare query dataset, we also propose features on rare queries dataset (a) Extract clicks from related query (b) Time based Page Rank. Then combine some features for SVM Regression to predict.
In my experiment results show that the proposed approach gains 10.28% improve over the original ranking in the AOL query log on Existed Query dataset. In rare query dataset, SVM Regression gains 1.14% improvement on Existed queries and 12.9% improvement on Non-Existed queries.
In the end, we analysis the improvement of each methods and discuss the pros and cons between these methods.

Acknowledgement I
摘要 II
Abstract III
Table of Contents V
List of Figures IX
List of Table XI
Introduction 1
1.1 Background 1
1.2 Motivation 2
1.3 Problem Definition 3
1.4 Proposed Solution 4
1.5 Contribution 4
1.6 Paper Organization: 5
Chapter 2 Related Works 6
2.1 How web search is strongly influenced by time? 6
2.2 Using time sensitive information to improve search result quality: 6
2.3 Query suggestion using time sensitive information: 7
Chapter 3 Methodology 9
3.1 System Overview 9
3.2 Feature Generation Methodologies when the URL document exists in training data (Existed Query) 11
3.2.1 Probabilistic Prior 11
3.2.2 Probabilistic Model with Language Model and KL-divergence 12
3.2.3 Page Rank and Time Based Page Rank 12
3.3 SVM Regression on Existed Query 17
3.4 Feature Extraction when the document does not exist in the training data (Rare Query) 18
3.5 SVM Regression on rare query 21
Chapter 4 Experiments 23
4.1 Data Sets 23
4.2 Re-Ranking Method 24
4.3 Evaluation Measures 25
4.4 Experiment results of Extract Features from Methodologies on Existed Query 26
4.4.1 Probabilistic prior result on Existed Query 26
4.4.2 Probabilistic Model using KL-divergence and Language Model similarity measure on Existed Query 27
4.4.3 Page Rank based improvement on Existed Query 29
4.5 SVM Regression improvement on Existed Query 30
4.6 SVM Regression improvement on rare query dataset: 32
4.7 Discussion 35
4.7.1 The performance between Probabilistic Prior and Probabilistic Model on Existed Query 35
4.7.2 Methodology improvement on original query dataset: 36
4.7.3 Methodology improvement on rare query dataset: 38
4.7.4 Dataset discussion on Existed Query: 39
4.7.5 Dataset discussion on Non-Existed Query: 40
Chapter 5 Conclusion and Future Work 43
5.1 Conclusion: 43
5.2 Future work: 44
Chapter 6 Reference: 45

