跳到主要內容

臺灣博碩士論文加值系統

(3.236.110.106) 您好!臺灣時間:2021/07/24 06:32
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:戴嘉宏
研究生(外文):Chia-Hung Tai
論文名稱:模糊群集式查詢擴展技術
論文名稱(外文):Fuzzy Cluster-Based Query Expansion
指導教授:魏志平魏志平引用關係
指導教授(外文):Chih-ping Wei
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊管理學系研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:英文
論文頁數:69
中文關鍵詞:字詞使用差異模糊群集群集式查詢擴展資訊擷取文件探勘查詢擴展文件分群字詞關聯性模糊群集式查詢擴展
外文關鍵詞:Fuzzy clusteringFuzzy cluster-based query expansionTerm associationInformation retrievalWord mismatchQuery expansionDocument clusteringCluster-based query expansionThesaurusText mining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:138
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:6
隨著網路與資訊科技的高速發展,越來越多的資訊以文字文件的型態出現在網路上。資訊擷取(Information Retrieval) 指的是依使用者所下的查詢語句將相關連的文章傳回給使用者。而字詞使用差異(Word Mismatch)對於資訊擷取是一項挑戰,字詞使用差異指的是使用者使用和文件中不同的關鍵詞來描述同一概念的情況,查詢擴展即是一個處理字詞使用差異的方法。

因此在本篇論文中,我們提出一個模糊群集式字詞擴展技術(Fuzzy Cluster-Based Query Expansion Technique)來解決字詞使用差異,並利用現存的字詞擴展技術 (也就是Global Analysis and Cluster-Based Query Expansion Technique)當做我們的衡量基準。根據實證的結果,我們發現模糊群集式字詞擴展技術則可以提供比現存字詞擴展技術較精確的查詢結果。
Advances in information and network technologies have fostered the creation and availability of a vast amount of online information, typically in the form of text documents. Information retrieval (IR) pertains to determining the relevance between a user query and documents in the target collection, then returning those documents that are likely to satisfy the user’s information needs. One challenging issue in IR is word mismatch, which occurs when concepts can be described by different words in the user queries and/or documents. Query expansion is a promising approach for dealing with word mismatch in IR.

In this thesis, we develop a fuzzy cluster-based query expansion technique to solve the word mismatch problem. Using existing expansion techniques (i.e., global analysis and non-fuzzy cluster-based query expansion) as performance benchmarks, our empirical results suggest that the fuzzy cluster-based query expansion technique can provide a more accurate query result than the benchmark techniques can.
CHAPTER 1 . INTRODUCTION 1
1.1 Background 1
1.2 Research Motivation and Objectives 4
1.3 Organization of the Thesis 5
CHAPTER 2 . LITERATURE REVIEW 6
2.1 Query Expansion Methods 6
2.1.1 Global analysis 6
2.1.2 Local feedback 7
2.1.3 Non-fuzzy cluster-based query expansion techniques 8
2.2 Thesaurus Construction Techniques 10
2.3 Document Clustering 14
2.4 Fuzzy Clustering and Fuzzy Document Clustering 16
CHAPTER 3 . DEVELOPMENT OF A FUZZY CLUSTER-BASED QUERY EXPANSION TECHNIQUE 19
3.1 Process of the Fuzzy Cluster-Based Query Expansion Technique 19
3.2 Thesauri Construction Process 20
3.2.1 Fuzzy document clustering 21
3.2.2 Local thesaurus construction 26
3.3 Query Process 27
3.3.1 Local query expansion 28
3.3.2 Document retrieval 29
CHAPTER 4 . EMPIRICAL EVALUATION 31
4.1 Data Collection 31
4.2 Evaluation Procedure and Criteria 34
4.3 Benchmark Techniques 38
4.4 Evaluation Results 40
4.4.1 Effects of number of document clusters 42
4.4.2 Comparative evaluation 43
4.4.3 Effects of the number of query terms 44
CHAPTER 5 . CONCLUSIONS AND FUTURE RESEARCH DIRECTIONS 49
REFERENCES 51
APPENDIX A: CANDIDATE TERMS FOR INFORMATION RETRIEVAL 55
APPENDIX B: INTERSECTION INFORMATION 61
[A73]Anderberg, M. R., Cluster Analysis for Applications, New York: Academic Press Inc., 1973.
[AF77]Attar, R. and Fraenkel, A. S., “Local Feedback in Full-Text Retrieval Systems,” Journal of the ACM, Vol. 24, No. 3, 1997, pp. 397-417.
[BR99]Baeza-Yates, R. and Ribeiro-Neto, B., Modern Information Retrieval. New York: Addison Wesley, ACM Press, 1999.
[B92]Brill, E., “A Simple Rule-Based Part of Speech Tagger,” Proceedings of the Third Conference on Applied Natural Language Processing, ACL Trento, Italy, 1992.
[B94]Brill, E., “Some Advances in Rule-Based Part of Speech Tagging,” Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), Seattle, WA, 1994.
[BGG99]Boley, D., Gini, M., Gross, R., Han, E., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, L., “Partitioning-Based Clustering for Web Document Categorization,” Decision Support Systems, Vol. 27, No. 3, December 1999, pp. 329-341.
[CCW95] Croft, W. B., Cook, R., and Wilder, D., “Providing Government Information on the Internet: Experiences with THOMAS,” Digital Libraries Conference, 1995, pp. 19-24.
[CH79] Croft, W. B., and Harper, D. J., “Using Probabilistic Models of Document Retrieval Without Relevance Information,” Journal of Documentation, Vol.35, 1979, pp. 285-295.
[CKP92] Cutting, D., Karger, D., Pedersen, J. and Tukey, J., “Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections,” Proceedings of 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1992, pp.318-329.
[D74]Dunn, J. C., “A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well separated Clusters,” Journal of Cybernetics, 3, 1974, pp. 95-104.
[D91]Dave, R. M., “Characterization and Detection of Noise in Clustering,” Pattern Recognition Letters, Vol.12, 1991, pp.657-664.
[EW86] El-Hamdouchi, A. and Willett, P., “Hierarchical Document Clustering Using Ward’s Method,” Proceedings of ACM Conference on Research and Development in Information Retrieval, 1986, pp. 149-156.
[FLG87]Furnas, G. W., Landauer, T. K., Gomez, L. M., and Dumais, S. T., “The Vocabulary Problem in Human-System Communication,” Communications of the ACM, Vol. 30, No. 11, November 1987, pp. 964-971.
[GK79]Gustafson, E. E., Kessel. W. C., “Fuzzy Clustering with a Fuzzy Covariance Matrix,” IEEE CDC, San Diego, California, 1979, pp. 761-766.
[HKK99]Hoppner, F., Klawonn, F., Kruse, R., and Runkler, T., Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition. New York: John Wiley & Sons, 1999.
[H03]Huang, C., “Cluster-Based Query Expansion Technique,” Master Thesis, Department of Information Management, National Sun Yat-sen University, Taiwan, July 2003.
[JMF99]Jain, A. K., Murty, M. N., and Flynn, P.J., “Data Clustering: A Review,” ACM Computing Surveys, Vol. 31, No. 3, September 1999.
[JC94]Jing, Y. and Croft, W. B., “An Association Thesaurus for Information Retrieval”, Technical Report, Department of Computer Science, University of Massachusetts at Amherst, 1994.
[J71]Jones, S. K. Automatic Keyword Classification for Information Retrieval, London: Butterworth, 1971.
[JJ70]Jones, S. K., and Jackson, D. “The Use of Automatically-Obtained Keyword Classifications for Information Retrieval,” Information Processing and Management, Vol. 5, 1970, pp. 175-201.
[KR90] Kaufman, L., and Rousseeuw, P. J., Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley & Sons, 1990.
[K89] Kohonen, T., Self-Organization and Associative Memory. New York: Springer, 1989.
[K95]Kohonen, T., Self-Organizing Maps. New York : Springer, 1995.
[KCM00]Kraft, D. H., Chen, J., and Mikulcic, A., “Combining Fuzzy Clustering and Fuzzy Inferencing in Information Retrieval,” Fuzzy Systems, 2000. FUZZ IEEE 2000. Ninth IEEE International Conference, Volume 1 , May 7-10, 2000, pp. 375-380.
[LHK96]Lagus, K., Honkela, T., Kaski, S., and Kohonen, T., “Self-Organizing Maps of Document Collections: A New Approach to Interactive Exploration,” Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 1996.
[LA99] Larsen, B. and Aone, C., “Fast and Effective Text Mining Using Linear-Time Document Clustering,” Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp. 16-22.
[L69]Lesk, M. E., “Word-Word Association in Document Retrieval systems,” American Documentation, Vol. 20, No. 27, 1969.
[L92]Lewis, D. D., “Representation and Learning in Information Retrieval,” Ph.D. thesis, University of Massachusetts at Amherst, 1992.
[MS00]Mendes, M. E. S., and Sacks, L., “Assessment of the Performance of Fuzzy Cluster Analysis in the Classification of RFC Documents,” Proceedings of London Communications Symposium, September 14-15, 2000, London.
[MS03]Mendes, M. E. S., and Sacks, L., “Evaluating Fuzzy Clustering for Relevance-based Information Access,” Fuzzy Systems, 2003. FUZZ ''03. 12th IEEE International Conference, Volume 1 , May 25-28, 2003, pp. 648-653.
[M80]McCarn, D., “MedLine: An Introduction to On-line Searching,” Journal of the American Society for Information Science, Vol. 31, No. 3, 1980, pp.181-192.
[M95]Miller, G. A., “WordNet: A Lexical Database for English,” Communications of the ACM, Vol. 38, No. 11, November 1995, pp. 39-41.
[MBF93] Miller, G. A., Beckwith, R., Felbaum, C., Gross, D., and Miller, K., “Introduction to WordNet: An On-line Lexical Database,” Revised Version. International Journal of Lexicography, Vol.3, No. 9, 1993.
[MWZ72]Minker, J., Wilson, G., and Zimmerman, B. “An Evaluation of Query Expansion by the Addition of Clustered Terms for a Document Retrieval System,” Information Storage and Retrieval, Vol. 8, 1972, pp. 329-348.
[MT01]Mylonopoulos, N. A., and Theoharakis, V., “On-Site: Global Perceptions of IS Journals,” Communications of the ACM, Vol. 44, No. 9, 2001, pp. 29-33.
[N01]National Library of Medicine, UMLS Knowledge Sources, 12th Experimental Edition, January 2001.
[QF93]Qiu, Y., and Frei, H. P., “Concept Based Query Expansion,” Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1993, pp. 160-169.
[RC99]Roussinov, D., and Chen, H., “Document Clustering for Electronic Meetings: An Experimental Comparison of Two Techniques,” Decision Support Systems, Vol. 27, No. 1-2, 1999, pp. 67-79.
[R97]Ruge, G., Combining Corpus Linguistics and Human Memory Models for Automatic Term Association, AI Group, Institut fuer Informatik, TU Muenchen. Natural Language Information Retrieval. Kluwer Academic Publishers, 1997.
[RPC01] Rui Pedro Chaves, “WordNet and Automated Text Summerization,” Computation of Lexical and Grammatical Knowledge Research Group, Centro de Linguística da Universidade de Lisboa, 2001.
[SB88] Salton, G., Buckly, C., “Term Weighting Approach in Automatic Text Retrieval,” Information Processing and Management, Vol. 24, no. 5, 1988, pp. 513—523.
[SB90] Salton, G. and Buckley, C., “Improving the Retrieval Performance by Relevance Feedback,” Journal of American Society for Information Sciences, Vol. 41, 1990, pp.288-197.
[V86] Voorhees, E. M., “Implementing Agglomerative Hierarchical Clustering Algorithms for Use in Document Retrieval,” Information Processing and Management, Vol. 22, 1986, pp. 465-476.
[V93] Voutilainen, A., “Nptool: A Detector of English Noun Phrases,” Proceedings of Workshop on Very Large Corpora, Ohio, June 1993.
[WBO00]Wei, J., Bressan, S., and Ooi, B. C., “Mining Term Association Rules for Automatic Global Query Expansion: Methodology and Preliminary Results,” Proceedings of the First International Conference on Web Information Systems Engineering, 2000, pp. 366-373.
[X97] Xu, J., “Solving the Word Mismatch Problem Through Automatic Text Analysis,” Ph.D. Thesis, University of Massachusetts at Amherst, 1997.
[XC96] Xu, J., and Croft, W. B., “Query Expansion Using Local and Global Document Analysis,” Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996, pp. 4-11.
[YL03]Yang, C. C., and Luk, J., “Automatic Generation of English/Chinese Thesaurus Based on a Parallel Corpus in Laws,” Journal of the American Society for Information Science and Technology, Vol. 54, No. 7, 2003.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top