(34.239.150.57) 您好!臺灣時間:2021/04/18 23:49
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:楊敦淇
研究生(外文):Duen-Chi Yang
論文名稱:應用相關資訊回饋於貝氏混合式機率檢索模型
論文名稱(外文):Using Relevance Feedback in Bayesian Probabilistic Mixture Retrieval Model
指導教授:簡仁宗簡仁宗引用關係
指導教授(外文):Jen-Tzung Chien
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:中文
論文頁數:75
中文關鍵詞:資訊檢索混合式機率檢索模型相關資訊回饋語言模型
外文關鍵詞:probabilistic mixture retrieval modelrelevance feedbackinformation retrievallanguage model
相關次數:
  • 被引用被引用:5
  • 點閱點閱:154
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:24
  • 收藏至我的研究室書目清單書目收藏:2
本論文之目的是為了提昇文件資料檢索的效率,節省在資料重複篩選和過濾的成本上。論文內提出一套不需使用者介入的相關資訊回饋(Relevance Feedback)方法,此方法可應用於資訊檢索系統內,以提昇檢索的效能。傳統上,資訊回饋的技術主要包括查詢句擴充法(Query Expansion)和查詢詞權重之調整(Query Term Reweighting);本論文主要是架構在以混合式N-gram模型(Mixture of N-gram Model)為主的檢索系統上,為了加強檢索效能,我們除了採用查詢句擴充法之外,還強調了不同查詢詞的重要性,有效結合查詢詞權重之調整技術;此外,我們也利用前一回合檢索分數前N名文件和資料庫的每份文件重調成新的文件語言模型,以提供較好的文件語言模型於檢索時使用。查詢詞權重之調整以及文件語言模型之調整都以最佳相似度(Maximum Likelihood Estimation,MLE)為參數估測準則,並使用EM(Expectation Maximization)演算法去估測出最佳的參數組,本論文並提出檢索模型中文件語言模型之最佳事後機率(Maximum a Posterior, MAP)的調整。在實驗中,我們使用了TDT2裡的新華社新聞文件做實驗文集,若觀察查詢詞權重調整與文件語言模型調整之實驗結果,我們發現單獨檢索正確率的確有所提昇。若將回饋之方式做一合併,其實驗結果比單純的查詢句擴充法還有效。若文件語言模型的調整改以最佳事後機率(MAP)做估測,則最後實驗結果比起MLE有更佳的檢索正確率。
A procedure is studied for the purpose of retrieval efficiency improvement for text data to save the cost for who eager for information. In this thesis, a new relevance feedback technique is developed and applied for information retrieval. We focus on automatically adapting model parameters to improve the retrieval performance. Traditionally, the “query expansion” and “query term reweighting” are viewed as two popular of relevance feedback approaches. In this study, the retrieval framework is based on the mixture N-gram model. To improve the retrieval performance, we apply the techniques of query expansion as well as query term rewighting for relevance feedback. Furthermore, the top N retrieved documents at the previous iteration are used for adapting query relevent to a new query document model which incorporates more information and is more useful in the retrieval process. Both query term reweighting and document language model adaptation apply ML (Maximum Likelihood) estimation and Expectation-Maximization (EM) algorithm to estimation the best parameters. In the experiments, the Xinhua news of TDT2 corpus are adopted. We find that the experimental results using query term reweighting and document model adaptation are desirable. If we combine three relevance feedback approaches, the results are further improved compared to using individual approach. Map language model adaptation achieves better performance than ML adaptation in the information retrieval system.
摘要v
圖目錄xii
表目錄xiii
符號說明xiv
第一章 簡介1
1.1研究之背景1
1.2研究動機與方法2
第二章 檢索架構與相關資訊回饋介紹5
2.1傳統的檢索模型5
2.2相關資訊回饋(Relevance Feedback)11
2.3向量檢索的回饋13
2.4機率檢索的回饋17
2.5檢索效能的評估方法(Performance Evaluation)20
第三章 使用相關資訊回饋方式於貝氏混合式機率檢索22
3.1 N-gram模型介紹22
3.2使用語言模型的文件檢索26
3.3混合式機率模型檢索28
3.4新穎之回饋流程架構簡介35
3.5查詢句擴充(Query Expansion)38
3.6查詢詞權重重新調整(Query Term Reweighting)38
3.7貝氏混合式文件語言模型之調整
(Bayesian Adaptation of Mixture Document Language Model)42
第四章 實驗54
4.1實驗環境54
4.2實驗文集說明55
4.3實驗結果56
第五章 結論與未來研究方向61
References63
Appendix72
[1] Gianni Amati, and Cornelis Joost Van Rijsbergen, “ Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness”, ACM Transactions on Information Systems, Vol. 20, No. 4, Pages 357—389, October 2002.
[2] Pedro Amo, Francisco L. Ferreras, Fernando Cruz and Manuel Rosa, “Smoothing Functions for Automatic Relevance Feedback in Information Retrieval ”, In Proceedings of 11th International Workshop on Database and Expert Systems Applications, Pages 115 —119, 2000.
[3] Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley Longman, pages 118-123, May 1999.
[4] Jeorme R. Bellegarda, “An Overview of Statistical Language Model Adaptation ”, Adaptation Methods For Speech Recognition Workshop 2001, Sophia-Antipolis, France, August, 2001.
[5] Adam Berger and John Lafferty, “ Information Retrieval as Statistical Translation ”, In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 222—229, 1999.
[6] Chris Buckley and Gerard Salton, “Optimization of Relevance Feedback Weights”, In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 351-357, 1995.
[7] Claudio Carpineto, Renato De Mori , Glovanni Romano , and Brigitee Bigi, “An Information-Theoretic Approach to Automatic Query Expansion”, ACM Transactions on Information Systems, Vol.19, No. 1, pages 1—27, January 2001.
[8] Claudio Carpineto, Glovanni Romano , Vittorio Giannini “ Improving Retrieval Feedback with Multiple Term-Ranking Function Combination”, ACM Transactions on Information Systems, Vol. 20, No. 3, pages 259—290, July 2002.
[9] Chia-Hui Chang and Ching-Chi Hsu, “Enabling Concept-Based Relevance Feedback for Information Retrieval on the WWW”, IEEE Transactions on Knowledge and Data Engineering, Vol. 11,No. 4, pages 595-609, 1999.
[10] Stanley F. Chen and Joshua Goodman, “An Empirical Study of Smoothing Techniques for Language Modeling ” , Computer Speech and Language , Vol.13, pages 359-394, 1999.
[11] Berlin Chen, “Speech Information Retrieval for Mandarin Chinese Syllable-based Index Feature, Statistical Retrieval Models and Improved Approach”, Ph.D. Dissertation, 2001.
[12] Berlin Chen, Hsin-min Wang, and Lin-shan Lee , “ An HMM/N-gram-based Linguistic Approach for Mandarin Spoken Document Retrieval”, In Proceedings of the 7th European Conference on Speech Communication and Technology ( Eurospeech2001 ) , Aalborg Demark, Sept. 2001.
[13] Langzhou Chen, Jean-Luc Gauvain, Lori Lamel, Gilles Adda and Martine Adda , “ Using Information Retrieval Methods for Language Model Adaptation ”, In Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech2001), Aalborg Demark, Sept. 2001.
[14] Jen-Tzung Chien and Hung-Ying Chen , “ Association Rule-based Language Models for Discovering Long Distance Dependency in Chinese ”, In Proceedings of Research on Computational Linguistics Conference XIV(ROCLING XIV ), pages 43-63, Tainan, Taiwan, August 2001. (in Chinese)
[15] F. Crestani, M. Lalmas, Cornelis Joost Van Rijsbergen, and I. Campbell, “Is this document relevant ? … Probably: A Survey of Probabilistic Models in Information Retrieval ” , ACM Computing Surveys, Vol.30, No. 4, pages 528-552, December 1998.
[16] W. Bruce Croft, Stephen Cronen-Townsend and Victor Lavrenko, “ Relevance Feedback and Personalization : A Language Modeling Perspective”, In Proceedings of the Joint DELOS-NSF Workshop on Personalization and Recommender Systems in Digital Libraries, pages 49-54, 2001.
[17] A.P. Dempster, N.M. Laird, and D.B Rubin, “Maximum Likelihood From Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society, Vol. 39, No. 1, pages 1-38, 1977.
[18] Marcello Federico, “Bayesian Estimation Methods for N-gram Language Model Adaptation” , In International Conference on Spoken Language Processing, pages 240-243, Philadelphia, 1996.
[19] Larry Fitzpatrick and Mei Dent, “Automatic Feedback Using Past Queries : Social Searching?”, In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 306-313, 1997.
[20] Jelinek Frederick, Statistical Methods for Speech Recognition, The MIT Press, Cambridge, Massachusetts, 1997.
[21] Nobert Fuhr, Chris Buckley, “Probabilistic Document Indexing from Relevance Feedback Data”, In Proceeding of the 13rd ACM SIGIR Conference on Research and Development in Information Retrieval, Pages 45-62, September 1990.
[22] Nobert Fuhr, “Probabilistic Models in Information Retrieval”, The Computer Journal, Vol.35, N0.3, pages 243-255, 1992.
[23] Jean-Luc Gauvain and Chin-Hui Lee, “ Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observation of Markov Chains”, IEEE Transactions on Speech And Audio Processing, Vol. 2, No. 4, pages 291-298, April 1994.
[24] Donna Harman, “Relevance Feedback Revisited”, In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1-10, 1992.
[25] Donna K. Harman, “Relevance Feedback and Other Query Modification Techniques”, In Information Retrieval-Data Structure and Algorithms, William B. Frakes and Ricardo Baeza-Yates, Eds.,PrenticeHall, Englewood Cliffs, New Jersey, pages 241-263, 1992.
[26] Djoerd Hiemstra and Arjen P. de Vries, “Relating the New Language Models of Information Retrieval to the Traditional Retrieval Models” , In CTIT Technical Report TR-CTIT-00-09, Centre for Telematics and Information Technology, 2000. (http://www.ctit.utwente.nl)
[27] Djoerd Hiemstra. “Using Language Models for Information Retrieval”, PhD thesis, University of Twente, 2001.
[28] Djoerd Hiemstra and Stephen Robertson, “Relevance Feedback for Best Match Term Weighting Algorithms in Information Retrieval”, In Proceedings of the Joint DELOS-NSF Workshop on Personalisation and Recommender Systems in Digital Libraries , pages 37-42 , June 2001.
[29] Djoerd Hiemstra, “Term-Specific Smoothing for the Language Modeling Approach to Information Retrieval: The Importance of a Query Term”, In Proceedings of the Annual International 25th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 35— 41, 2002.
[30] Xuedong Huang , Alex Acero , Hsiao-Wuen Hon, Spoken Language Processing-A Guide to Theory, Algorithm, and System Development, Microsoft Research, Prentice Hall PTR, pages 73-132, 2001.
[31] Qiang Huo and Chin-Hui Lee, “ On-Line Adaptive Learning of the Continuous Density Hidden Markov Model Based on Approximate Recursive Bayes Estimate”, IEEE Transactions on Speech And Audio Processing, Vol. 5, No. 2, pages 161-172, March 1997.
[32] Rukmini Iyer and Mari Ostendorf, “Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models”, IEEE Transactions on Speech and Audio Processing, Vol. 7, Issue: 1, pages 30-39, January 1999.
[33] R. Iyer and M. Ostendorf , “Relevance Weighting for Combining Multi-domain Data for N-gram Language Modeling ”, Computer Speech and Language, vol.13, pages 267-282, 1999.
[34] Hongyan Jing and Evelyne Tzoukermann, “Information Retrieval Based on Context Distance and Morphology ”, In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 90-96, 1999.
[35] John Lafferty and Chengxiang Zhai, “Document Language Models, Query Models, and Risk Minimization”, In Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pages 111 — 119, 2001.
[36] Victor Lavrenko and W. Bruce Croft, “Relevance-based Language Models ”, In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages120-127, 2001.
[37] David R. H. Miller, Tim Leek, and Richard M. Schwartz, “A Hidden Markov Model Information Retrieval System ”, In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 214-221, 1999.
[38] Mandar Mitra, Amit Singhal, Chris Buckley, “Improving Automatic Query Expansion”, In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4-11, 1998.
[39] K.Ng. “A Maximum Likelihood Ratio Information Retrieval Model”, In Proceedings of the 8th Text Retrieval Conference(TREC-8), pages 285-300, 1999.
[40] Jay M. Ponte and W. Bruce Croft, “A Language Modeling Approach to Information Retrieval”, In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275-281, 1998.
[41] Jay M. Ponte, “Language Models for Relevance Feedback”, In Advances in Information Retrieval, Ed. W. Bruce Croft, Dordrecht: Kluwer, pages 73-95, 2000.
[42] Yonggang Qiu and Hans-Peter Frei, “Concept-based Query Expansion”, In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 160-169, 1993.
[43]L. Rabiner and Biing-Hwang Juang, “An introduction to hidden Markov models”, ASSP Magazine, IEEE [see also IEEE Signal Processing Magazine] , Vol. 3, Issue: 1 , pages 4 —16, Jan 1986.
[44] S. Robertson and K. Sparck Jones, “Relevance Weighting of Search Term”, Journal of the American Society for Information Science 27, pages 129-146, 1976.
[45] S. E. Robertson, “On Term Selection for Query Expansion”, Journal of Documentation, Vol.46, No.4, pages 359-364, 1990.
[46]S. E. Robertson, S. Walker, and M. Beaulieu, “Okapi at TREC-7 : Automatic Ad Hoc, Filtering, VLC and Interactive Track”, In Proceedings of the 7th Text Retrieval Conference (TREC-7), pages 253-264 ,1999 .
[47] S. E. Robertson, S. Walker, “Okapi/Keenbow at TREC-8”, In Proceedings of the 8th Text Retrieval Conference(TREC-8), pages 151-162, 1999.
[48] Ronald Rosenfeld , “ Two decades of Statistical Language Modeling: Where Do We Go From Here?”, In Proceedings of the IEEE, Vol. 88, Issue:8, pages 1270-1278, August 2000.
[49] S. E. Robertson and Djoerd Hiemstra, “Language Models and Probability of Relevance ” , In Proceedings of the Workshop on Language Modeling and Information Retrieval , Carnegie Mellon University, USA, pages 21-25, May 2001.
[50] Gerard Salton , The Smart Retrieval System-Experiments in Automatic Document Processing, Prentice Hall Inc., Englewood Cliffs, New Jersey, 1971.
[51] Gerard Salton, A. Wang, C. S. Yang, “A Vector Space Model for Automatic Indexing”, Communications of the ACM, Vol. 11, pages 613-620, 1975.
[52] Gerard Salton and M. J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, New York, 1983.
[53] Gerard Salton and Chris Buckley, “Term-Weighting Approaches in Automatic Text Retrieval”, Information Processing and Management 24, pages 513-523, 1988.
[54] Gerard Salton and Chris Buckley, “Improving Retrieval Performance by Relevance Feedback”, Journal of the American Society for Information Science, Vol.44, No.4, pages 288-297, 1990.
[55] A. F. Smeaton and Cornelis Joost Van Rijsbergen, “ The Retrieval Effects of Query Expansion on A Feedback Document Retrieval System”, The Computer Journal,Vol.26 ,No. 3, pages 239-246, 1988.
[56] F. Song and W. Bruce Croft , “A General Language Model for Information Retrieval”, In Proceedings of the 8th International Conference on Information and Knowledge Management (CIKM1999), ACM Press, pages 93-96, 1999.
[57] Sergios Theodoridis and Konstantinos Koutroumbas, Pattern Recognition, The Academic Press, pages 13-54, 1999.
[58] S. K. M. Wang and Y. Y. Yao,“A Probability Distribution Model for Information Retrieval ”, Information Processing and Management , Vol.25, No.1 pages 39-53, 1989.
[59] Ross Wilkinson, Justin Zobel and Ron Sacks-Davis,“Similarity Measures for Short Queries”, In Proceedings of the 4th Text Retrieval Conference (TREC-4), pages 277-285, October 1995.
[60] Jinxi Xu and W. Bruce Croft, “Query Expansion Using Local and Global Document Analysis”, In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4-11, 1996.
[61] Chengxiang Zhai and John Lafferty, ”Model-based Feedback in the Language Modeling Approach to Information Retrieval”, In Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM2001), ACM Press, pages 403-410, 2001.
[62] Chengxiang Zhai and John Lafferty,“A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval ”, In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 334-342, 200l.
[63] Chengxiang Zhai and John Lafferty, “ Two-Stage Language Models for Information Retrieval”, In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 49—56, 2002.
[64] 李建志, “應用混合式機率模型於新聞資訊檢索之研究”, 碩士論文, 成功大學資訊工程, 2002.
[65] CKIP, http://godel.iis.sinica.edu.tw, 中央究院資訊科學研究所詞庫小組。
[66] Google搜尋說明,http://www.google.com.tw/intl/zh-TW/help.html。
[67] Yahoo奇摩搜尋說明,http://help.yahoo.com/help/tw/ysearch/。
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊
 
系統版面圖檔 系統版面圖檔