跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.172) 您好!臺灣時間:2025/02/18 04:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:鍾明璇
研究生(外文):Ming-Hsuan Chung
論文名稱:應用關聯規則技術有效輔助以向量空間模型為基礎之文件群集法
論文名稱(外文):Applying the Association Rules to Refine the VSM-based Document Clustering
指導教授:李維平李維平引用關係
指導教授(外文):Wei-Ping Lee
學位類別:碩士
校院名稱:中原大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:69
中文關鍵詞:文件探勘資料探勘文件群集向量空間模型關聯規則
外文關鍵詞:Text MiningData MiningAssociation RuleVector Space ModelDocument Clustering
相關次數:
  • 被引用被引用:15
  • 點閱點閱:324
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:2
面對現今如細胞增殖般快速成長的資訊,如何有效地取得、組織、呈現、及應用這些資訊的方法,將是致勝的關鍵。
群集化技術,能將資料依某種特徵自動地組織及分類;而將該技術應用於文件型態的資料時,則能提升資訊檢索系統的搜尋效果、有效地組織及呈現資訊、及自動建立文件的分類架構(如Yahoo網站的分類目錄)。傳統的文件群集化涉及了二個重要的步驟:(1)萃取文件特徵,並將文件對應至向量空間模型中;(2)利用特定的群集演算法進行群集。然而,在第一個步驟中的向量空間模型,本身有些先天的缺失,其無法區別文中詞彙間的關聯性,因之可能導致後續的群集運算失準。因此,本研究將利用資料探勘領域中的「關聯規則探勘技術」,改善傳統文件群集方法的缺失,有效提升群集的品質。
研究中利用關聯規則探勘技術,找出文件中詞彙的關聯性,以之對向量空間模型進行修正。最後並以Reuters-21578文件集進行實驗評估,將本研究所提出的文件群集法與傳統的文件群集法相比,証明了本研究所提出的方法確實能提升文件群集的效果,產生高品質的文件群集。未來希望將之應用於其它各種以向量空間模型為基礎的文件群集演算法當中,以更加提升文件群集的效果。
Nowadays, the information flow grows as fast as the cell division; being able to retrieve, organize, and present these fast growing information efficiently will be the key to success.
Clustering has been investigated for organizing and classifying information automatically according to some features. When applying this technology to documentary data, it can improve the precision or recall in information retrieval systems, and allow the system to organize and present information efficiently. Furthermore, Document clustering has also been used to automatically generate hierarchical clusters of documents (E.g.: The automatic generation of taxonomy of Web documents like that provided by Yahoo!). The traditional document clustering involves two phases: first, feature extraction maps each document or record to a point in vector space model, then applying specific clustering algorithms to group the points into clusters. Nevertheless, due to some inherent defects of the vector space model, which can’t differentiate relationships of the terms in documents, these may cause errors in the following operations. Therefore, this study proposes to use the association rule, which is one of the Data mining techniques, to make up for the inadequacy of the traditional document clustering and effectively improve the quality of clustering.
This study use association rules to mine the relationships between terms in documents and further improves the shortcomings of the vector space model. At the end, we conducted some experiments with the Reuters-21578 corpus, we have compared the proposed method of document clustering with traditional one, and proved that the proposed method does generate higher quality clusters than the one produced by the traditional method. In the future, we plan to apply the proposed method of document clustering to other clustering algorithms based on the vector space model in order to further improve the quality of clustering.
中文摘要I
ABSTRACTII
致謝辭III
目 錄IV
圖目錄VI
表目錄VII
第一章 緒論1
第一節、研究動機1
第二節、研究目的4
第三節、研究範圍5
第二章 文獻探討7
第一節、資料庫知識發掘(KDD) 7
第二節、文字知識發掘(KDT)9
第三節、向量空間模型(VECTOR SPACE MODEL, VSM) 11
第四節、群集化(CLUSTERING) 15
第五節、關聯規則(ASSOCIATION RULE) 21
第三章 研究方法25
第一節、研究流程25
第二節、系統架構26
第三節、研究設計28
第四章 實驗評估36
第一節、實驗說明36
第二節、實驗分析43
第三節、實驗討論51
第五章 結論與建議52
第一節、結論52
第二節、研究限制53
第三節、研究貢獻54
第四節、後續研究方向 55
參考文獻57
1.[AOL93] R. B. Allen, P. Obry, and M. Littman, “An Interface for Navigating Clustered Document Sets Returned by Queries,” Proceedings of the ACM Conference on Organizational Computing Systems, 1993, pp.166-171.2.[AS94] Agrawal, R. and Srikant, R. “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th Int''''l Conference on Very Large Databases, Santiago, Chile, Sep, 1994.3.[BA99] R. J. Bayardo Jr. and R. Agrawal, “Mining the Most Interesting Rules,” Proceedings of the 5th ACM SIGKDD Int''''l Conference on Knowledge Discovery and Data Mining, 1999, pp.145-154.4.[Che01] H. Chen, “Knowledge Management Systems─A Text Mining Perspective,” Ph.D. thesis, 2001.5.[CHY96] M. S. Chen, J. Han, and P. S. Yu, “Data Mining: An Overview from Database Perspective,” IEEE Transactions on Knowledge and Data Eng., 8(6), Dec. 1996, pp.866-883.6.[CKPT92] D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey, “Scatter/Gather: A Cluster-Based Approach to Browsing Large Document Collections,” 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1992, pp.318-329.7.[Cro78] W. B. Croft, “Organizing and Searching Large Files of Documents,” Ph.D. Thesis, University of Cambridge, Oct. 1978.8.[DDFLH90] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by Latent Semantic Analysis,” Journal of the American Society for Information Science, 41(6), 1990, pp.391-407.9.[DFG01] I. Dhillon, J. Fan, and Y. Guan, “Efficient Clustering of Very Large Document Collections,” Data Mining for Scientific and Engineering Applications, Kluwer Academic Publishers, 2001, Ch.1.10.[DGS99] J. Dörre, P. Gerstl and R. Seiffert, “Text Mining: Finding Nuggets in Mountains of Textual Data,” Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp.398-401.11.[DJ88] R. C. Dubes and A.K. Jain, Algorithms for Clustering Data, Prentice Hall, 1988.12.[DM01] I.S. Dhillon and D. S. Modha, “Concept Decompositions for Large Sparse Text Data Using Clustering,” Machine Learning, 42(1), Jan. 2001, pp.143-175.13.[Fab94] V. Faber, “Clustering and the Continuous k-Means Algorithm”, Los Alamos Science, November 22, 1994.14.[FBY92] W. B. Frakes and R. Baeza-Tates, Information Retrieval: Data Structures and Algorithms, Prentice Hall Englewood Cliffs, New Jersey, 1992, Ch.7.15.[FPS96a] U. Fayyad, G. Piatetsky-Shapiro and P. Smyth, “The KDD Process for Extracting Useful Knowledge from Volumes of Data,” Communications of the ACM, 39(11), 1996, pp.27-34.16.[FPS96b] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, ”From Data Mining to Knowledge Discovery: An Overview,” Advances in Knowledge Discovery and Data Mining, 1996, pp.1-36.17.[FU96] U. Fayyad, and R. Uthurusamy, “Data mining and knowledge discovery in databases,” Communications of the ACM, 39(11), 1996, pp.24-2618.[FKYKR97] R. Feldman, W. Klosgen, B. Y. Yaniv, G. Kedar, and V. Reznikov., “Pattern Based Browsing in Document Collections,” Proceedings of First European Symposium on Principles of Data Mining and Knowledge Discovery, June 1997, pp.112-122.19.[FH96] R. Feldman, and H. Hirsh, “Mining Association in Text in the Presence of Background Knowledge,” Proceedings of 2nd international Conference on Knowledge Discovery and Data Mining, 1996, pp.343-346.20.[FH97] R. Feldman, and H. Hirsh, “Exploiting Background Information in Knowledge Discovery from Text,” Journal of Information System, Vol.9, 1997, pp.83-97.21.[FKZ97] R. Feldman, W. Klosgen and A. Zilberstein, “Visualization Techniques to explore Data Mining Result s for Document Collections,” Proceedings of the Third International Conference on Knowledge Discovery & Data Mining, 1997, pp.16-23.22.[GLW86] A. Griffith, H. C. Luckhurst, P. Willet, “Using Inter-Document Similarity Information in Document Retrieval Systems,” Journal of the American Society for Information Science, Vol.37, pp.3-11, 1986.23.[GS98] M. Goldszmidt, and M. Sahami, “A Probabilistic Approach to Full-Text Document Clustering,” Technical Report ITAD-433-MS-98-044, SRI International, 1998.24.[HCB00] M. H. Haddad, J. P. Chevallet, and M. F. Bruandet, “Relations between Terms Discovered by Association Rules,” 4th European conference on Principles and Practices of Knowledge Discovery in Databases (PKDD''''2000) Workshop on Machine Learning and Textual Information Access, Lyon France, Sep.12, 2000.25.[Hil68] D. R. Hill, A Vector Clustering Technique, Mechanized Information Storage, Retrieval and Dissemination, North-Holland, Amsterdam, 1968.26.[HK00] J. Han, and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.27.[HPY00] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Geneation,” Proceedings of the 2000 ACM-SIGMOD International. Conference Management of Data (SIGMOD’00), Dallas, TX, May 2000, pp1-12.28.[KL00] H. J. Kim and S. G. Lee, ”A Semi-Supervised Document Clustering Technique for Information Organization,” Proceedings of the ninth International Conference on Information knowledge management (CIKM), 2000.29.[KM00] G. J. Kowalski, and M. T. Maybury, Information Storage and Retrieval Systems, 2nd Edition, Kluwer Academic Publishers, 200030.[KMO99] W. A. Kosters, E. Marchiori, A. A. J. Oerlemans, “Mining Clusters with Association Rules,” Proceedings of III Int''''l Symposium on Intellegent Data Analysis, Aug. 1999, pp.39-50.31.[LA99] B. Larsen and C. Aone, “Fast and effective text mining using linear-time document clustering,” Proceedings of the Fifth ACM SIGKDD Int''''l Conference on Knowledge Discovery and Data Mining, 1999 ,pp.16-22.32.[LC96] A. V. Leouski, and W. B. Croft, “An Evaluation of Techniques for Clustering Search Results,” Technical Report IR-76, Department of Computer Science, University of Massachusetts, Amherst,1996.33.[Lew92] D. D. Lewis, ”Representation and Learning in Information Retrieval,” Ph.D. thesis, 1992.34.[Lew96] D. D. Lewis, “The Reuters-21578 Text Categorization Test Collection,” http://www.research.att.com/~lewis/reuters21578.html, 1996.35.[MHB97] J. Moore, E. H. Han, D. Boley, M. Gini, R. Gros, K. Hasting, G. Karypis, V. Kumar, and B. Mobasher, “Web Page Categorization and Feature Selection Using Association Rule and Principal Component Clustering,” 7th Workshop on Information Technologies and Systems, (WITS''''97), 199736.[Por80] M. F. Porter, “An Algorithm for Suffix Stripping,” Program, 14(3), 1980, pp.130-137.37.[RG00] S. M. Rüger and S. E. Gauch, “Feature Reduction for Document Clustering and Classification,” Technical report, Computing Department, Imperial College, London, UK, 2000.38.[Sal88] G. Salton, Automatic Text Processing, Addison-Wesley Publishing Company, 1988.39.[SCHS99] L. Singh, B. Chen, R. Haight, P. Scheuermann, “An Algorithm for Constrained Association Rule Mining in Semi-structured Data,” PAKDD-99, April 1999, pp.148-158.40.[SKK00] M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” KDD Workshop on Text Mining, 2000.41.[SM83] G. Salton, and M. McGill, Introduction to Modern Information Retrieval, New York: McGraw-Hill, 1983.42.[SS98] H. Schütze and C. Silverstein, “Projection for Efficient Document Clustering”, Proceedings of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998, pp.74-81.43.[Sul01] D. Sullivan, Document Warehousing and Text Mining, Wiley Computer Publishing, 2001, pp.326.44.[SSC97] L. Singh, P. Scheuermann, and B. Chen, “Generating Association Rules from Semi-Structured Documents Using an Extended Concept Hierarchy,” ACM IKM, 1997, pp.193-200.45.[Van79] C. J. Van Rijbergen, Information Retrieval, Butterworths, 1979, Ch.7.46.[Wil88] P. Willet, “Recent Trends in Hierarchical Document Clustering: A Critical Review,“ Information Processing and Management, 24(5), 1988, pp.557-597.47.[ZE98] O. Zamir and O. Etzioni, “Web Document Clustering: A Feasibility Demonstration,” Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), 1998, pp.46-54.48.[ZK02] Y. Zhao and G. Karypis, “Criterion Functions for Document Clustering─Experiments and Analysis,” Technical Report #01-40, University of Minnesota, 2002
電子全文 電子全文(本篇電子全文限研究生所屬學校校內系統及IP範圍內開放)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊