(3.236.222.124) 您好!臺灣時間:2021/05/08 05:39
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:葉柏廷
研究生(外文):Bo-Ting Yeh
論文名稱:以潛藏主題分析為基礎的Web查詢詞分類之研究
論文名稱(外文):Web Query Classification based on Latent Topic Analysis
指導教授:楊正仁楊正仁引用關係
指導教授(外文):Cheng-Zen Yang
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:英文
論文頁數:37
中文關鍵詞:查詢詞;分類;查詢詞分類;資訊擷取與檢索
外文關鍵詞:queryclassificationquery classificationinformation retrieval
相關次數:
  • 被引用被引用:0
  • 點閱點閱:254
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
現今網路搜尋引擎扮演一個相當重要的角色,它幫助人們能有效力地從龐大的網
路資料中找出它們所想要的資訊。查詢詞分類是搜尋引擎技術中一個重要的議
題。查詢詞分類的任務就是要把查詢詞正確分類到有關的目錄。針對查詢詞分類
問題,我們主要面臨兩個困難,第一,大部分的查詢詞字串是簡短與模糊的。第
二,許多的查詢詞包含二個以上的使用者意圖。因此本研究提出一個方法,先利
用多個搜尋引擎去擴充簡短的查詢詞,之後從這些擴充資訊中擷取出查詢詞可能
包含到的多個主題語意,利用潛藏狄式配置從這些資訊當中取出其潛藏的語意。
所提出的方法相較於Shen等人在2005年所提出的方法,在精準度上改進了6.5%,
而在F1上則改進了6.6%。藉由實驗的證明,我們方法能有效地增進查詢詞分類的
效能表現。

Nowadays Web search engines play an important role in helping people effectively find
information from massive Web data. The Web query classification (WQC) problem is a
crucial issue in search engine technology. The task of WQC is to classify Web queries
into relevant Web categories. For the WQC problem, there are two major difficulties.
First, most queries are short and ambiguous. Second, many queries have more than one
user intention. Therefore, this research proposes a scheme that exploits multiple search
engines to enrich user queries, and then extracts multiple latent topics from the expanded
queries.The scheme uses the Latent Dirichlet Allocation (LDA) model to extract the latent
topics from the enriched queries for query classification. The experiments show that our
approach can improve the performance by 6.5% and 6.6% for precision and F1, respectively
in comparison with the schemes proposed by Shen et al. in 2005. The experimental
results show that the proposed LDA-based scheme can effectively improve the WQC performance.

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Definition and Challenges . . . . . . . . . . . . . . . . . . . . . 2
1.3 Overview of the Research Method . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Query Analysis and Classification . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Classification of Query Types . . . . . . . . . . . . . . . . . . . 6
2.1.2 Classification based on a Web Taxonomy . . . . . . . . . . . . . 7
2.2 Latent Dirichlet Allocation . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Latent Semantic Analysis . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Graph Representations . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 LDA for Document Classification . . . . . . . . . . . . . . . . . 15
2.3 Background of the KDD-Cup 2005 Dataset . . . . . . . . . . . . . . . . 15
3 Query Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 Enrichment on Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Building the Target Taxonomy . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Experiment and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1 The Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Collected Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 The Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4.1 Effect of LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4.2 Comparison with Previous Approaches . . . . . . . . . . . . . . 27
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

[1] Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval, 1st ed. ACM and Addison Wesley Longman Ltd., 1999.
[2] Steven M. Beitzel, Eric C. Jensen, David D. Lewis, and Abdur Chowdhury, “Automatic
Web Query Classification Using Labeled and Unlabeled Training Data,” in
Proceedings of the 28th Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval (SIGIR 2006), Salvador, Brazil, August
2005, pp. 581–582.
[3] Steven M. Beitzel and David D. Lewis, “Improving Automatic Query Classification
via Semi-supervised Learning,” in Proceedings of the 5th IEEE International
Conference on Data Mining (ICDM’05), 2005, pp. 42–49.
[4] N. J. Belkin, D. Kelly, G. Kim, J.-Y. Kim, H.-J. Lee, G. Muresan, M.-C. Tang,
X.-J. Yuan, and C. Cool, “Query Length in Interactive Information Retrieval,” in
Proceedings of the 26th Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval (SIGIR 2003), 2003, pp. 205–212.
[5] David M. Blei, Andrew Y. Ng, and Michaell Jordan, “Latent Dirichlet Allocation,”
The Journal of Machine Learning Research, vol. 3, pp. 993–1022, Mar. 2003.
[6] Andrei Broder, “A Taxonomy of Web Search,” ACM SIGIR Forum, vol. 36, no. 2,
pp. 3–10, Sep. 2002.
[7] Peter D. Bruza and Simon Dennis, “Query ReFormulation on the Internet: Empirical
Data and the Hyperindex Search Engine,” in Proceedings of the 5th Conference on Computer Assisted Information Searching on Internet (RIAO 1997), 1997, pp. 488–
499.
[8] Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen,
and Qiang Yang, “Context-Aware Query Classification,” in Proceedings of the 32nd
International ACM SIGIR Conference on Research and Development in Information
Retrieval (SIGIR 2009), Boston, Massachusetts, USA., July 2009, pp. 3–10.
[9] Eustache Diemert and Gilles Vandelle, “Unsupervised Query Categorization Using
Automatically-built Concept Graphs,” in Proceedings of the 18th International Conference
on World Wide Web (WWW 2009), 2009, pp. 461–470.
[10] Christiane Fellbaum, Rendee Tengi, Lavanya Jose, Peter Schulam, Isaac Julien, and
George A. Miller, “WordNet,” 2011, online: http://wordnet.princeton.edu/, accessed
12-July 2011.
[11] Natalie S. Glance, “Community Search Assistant,” in Proceedings of the 6th International
Conference on Intelligent User Interface (IUI 2001), 2001, pp. 91–96.
[12] Luis Gravano, Vasileios Hatzivassiloglou, and Richard Lichtenstein, “Categorizing
Web Queries According to Geographical Locality,” in Proceedings of the 12th International
Conference on Information and Knowledge Management (CIKM 2003),
New Orleans, Louisiana, USA, November 2003, pp. 325–333.
[13] Jian Hu, GangWang, Fred Lochovsky, Jian-tao Sun, and Zheng Chen, “Understanding
User’s Query Intent with Wikipedia,” in Proceedings of the 18th International
Conference on World Wide Web (WWW 2009), 2009, pp. 471–480.
[14] Bernard J. Jansen, Danielle L. Booth, and Amanda Spink, “Determining the Informational,
Navigational, and Transactional Intent of Web Queries,” Information
Processing and Management, vol. 44, no. 3, pp. 1251–1266, May 2008.
[15] Bernard J. Jansen, Amanda Spink, and Tefko Saracevic, “Real Life, Real Users,
and Real Needs: A Study and Analysis of User Queries on the Web ,” Information
Processing and Management, vol. 36, no. 2, pp. 207–227, Jan. 2000.
[16] Zsolt T. Kardkov’acs, Domonkos Tikk, and Zolt’an B’ans’aghi, “The Ferrety Algorithm
for the KDD Cup 2005 Problem,” ACM SIGKDD Explorations Newsletter, vol. 7,
no. 2, pp. 111–116, 2005.
[17] Uichin Lee, Zhenyu Liu, and Junghoo Cho, “Automatic Identification of User Goals
in Web Search,” in Proceedings of the 14th international conference on World Wide
Web (WWW 2005), Chiba, Japan, May 2005, pp. 391–400.
[18] Ying Li and Zijian Zheng, “KDD-Cup 2005 - Internet User Search Query Categorization,”
2005, online: http://www.acm.org/sigkdd/kddcup/, accessed 12-July 2011.
[19] Ying Li, Zijian Zheng, and Honghua (Kathy) Dai, “KDD CUP-2005 Report: Facing
a Great Challenge,” ACM SIGKDD Explorations Newsletter, vol. 7, no. 2, pp. 91–99,
2005.
[20] Xuan-Hieu Phan and Cam-Tu Nguyen, “JGibbLDA,” 2008, online: http://jgibblda.
sourceforge.net/, accessed 12-July 2011.
[21] Mark Sanderson, “Retrieving with Good Sense,” Information Retrieval, vol. 2, no. 1,
pp. 49–69, Feb. 2000.
[22] Dou Shen, Ying Li, and Dengyong Zhou, “Product Query Classification,” in Proceedings
of the 18th International ACM Conference on Information and Knowlede
(CIKM 2009), 2009, pp. 741–750.
[23] Dou Shen, Rong Pan, Jian-Tao Sun, Jeffrey Junfeng Pan, Kangheng Wu, Jie Yin,
and Qiang Yang, “Q2C@UST: Our Winning Solution to Query Classification in
KDDCUP 2005,” ACM SIGKDD Explorations Newsletter, vol. 7, no. 2, pp. 100–
110, 2005.
[24] Dou Shen, Rong Pan, Jian-Tao Sun, Jeffrey Junfeng Pan, KanghengWu, Jie Yin, and
Qiang Yang, “Query Enrichment for Web-query Classification,” ACM Transactions
on Information Systems, vol. 24, no. 3, pp. 320–352, Jul. 2006.
[25] Dou Shen, Jian-Tao Sun, Qiang Yang, and Zheng Chen, “Building Bridges for Web
Query Classification,” in Proceedings of the 29th Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval (SIGIR 2006),
2006, pp. 131–138.
[26] Karen Spark Jones and PeterWillett, Eds., Readings in Information Retrieval, 1st ed.
Morgan Kaufmann Publishers, Inc., 1997.
[27] David Vogel, Steffen Bickel, Peter Haider, Rolf Schimpfky, Peter Siemen, Steve
Bridges, and Tobias Scheffer, “Classifying Search Engine Queries Using theWeb as
Background Knowledge,” ACM SIGKDD Explorations Newsletter, vol. 7, no. 2, pp.
117–122, 2005.
[28] Oren Zamir and Oren Etzioni, “Grouper: A Dynamic Clustering Interface to Web
Search Results,” Computer Networks, vol. 31, pp. 1361–1374, 1999.




電子全文 電子全文(本篇電子全文限研究生所屬學校校內系統及IP範圍內開放)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔