( 您好!臺灣時間:2021/03/06 10:44
字體大小: 字級放大   字級縮小   預設字形  


研究生(外文):Yu-Ru Chen
論文名稱(外文):Using Data Mining to Construct a Flexible Web Searching System
指導教授(外文):Don-Lin Yang
外文關鍵詞:ontologyweb mininginformation extractionWorld Wide Webinformation retrievaldata mining
  • 被引用被引用:3
  • 點閱點閱:250
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:52
  • 收藏至我的研究室書目清單書目收藏:12
With the huge amount of information available on the World Wide Web, Web servers provide a fertile ground for information searches. Although numerous searching technologies have been developed for Web searches, there still has a lot of space for improvement. In this thesis we present a new ranking algorithm and an intelligent Web searching system using data mining techniques to search and analyze Web documents in a more flexible and effective way. Our method employs the characteristics of Web documents to extract, find, and rank data in a meaningful manner. We utilize hyperlink structures along with the content of Web documents intelligently to rank the retrieved results. It can solve the ranking problems of existing algorithms such as multi-frame Web documents and unrelated linking documents. In addition, we use domain specific ontologies to improve our query process and to rank retrieved Web documents with better semantic notion. We also provide more friendly and easy to use interfaces.
Two data mining techniques, association rules mining and clustering are employed online for users to explore and browse the retrieved documents conveniently. We use association rules mining to find the patterns of maximal keyword sets, which represent the main characteristics of the retrieved documents. For subsequent queries, these keywords become recommended sets of query terms for the specific users. Clustering is used to group retrieved documents into distinct clusters that can help users make their decisions easier. We have developed the proposed system and tested it with both Chinese and English Web documents from the Web site of Feng Chia University. The result shows that it is indeed a flexible and effective Web searching system.
Table of Contents3
List of Figures6
List of Tables7
Chapter 1 8
Chapter 2 12
Background and Related Works12
2.1 Web Search Engine12
2.2 Data Mining16
2.2.1 Association rules mining17
2.2.2 Sequential pattern mining19
2.2.3 Classification and prediction20
2.2.4 Clustering23
Chapter 3 25
System Architecture25
3.1 Crawler27
3.2 Language Processor28
3.3 Databases30
Chapter 4 32
Searching Process and Ranking Algorithm32
4.1 Searching Process32
4.2 Ontology33
4.3 Our Ranking Algorithm37
4.3.1 Model for Web documents37
4.3.2 Linkage expansion38
4.3.3 Ranking Algorithm40
4.3.4 Ranking result analysis44
Chapter 5 46
Data Miner46
5.1 Chinese Phrases Extraction46
5.2 Keyword Association48
5.3 Document Clustering49
Chapter 6 52
Implementation and Experiments52
6.1 Implementation52
6.2 Experiment Results57
Chapter 7 59
Conclusions and Future Work59
7.1 Summary59
7.2 Future Work60
[1]. http://www.google.com/[2]. http://www.yahoo.com/[3]. http://www.MetaCrawler.com/[4]. http://www.altavista.com/[5]. S. Lawrence and C.L. Ciles, “Searching the World Wide Web,” Science, 280(5360):98-100, April 1998.[6]. S. Chakrabarti, M. van den Berg, and B. Dom, “Focused crawling: A new approach to topic-specific Web resource discovery,” In Proceedings of the 8th International World Wide Web Conference, pp. 1623-1640, May 1999.[7]. J. Cho, H. Garcia-Molina, and L. Page, “Efficient crawling through URL ordering,” In Proceedings of the 7th international World Wide Web Conference, pp. 161-172, April 1998.[8]. S.-H. Teng, Q. Lu, M. Eichstaedt, D. Ford, and T. Lehman, “Collaborative Web crawling: Information gathering/processing over Internet,” In Proceedings of the 32nd Hawaii International Conference on System Science, pp. 1-12, 1999.[9]. W. B. Frakes and R. Baeza-Yates, “Information retrieval: Data structures and algorithms,” Prentice-Hall, 1992.[10]. Soumen Chakrabarti, “Data mining for hypertext: A tutorial survey,” ACM SIGKDD Explorations, Jan 2000.[11]. Carriere, J., Kazman, R., “WebQuery: searching and visualizing the Web through connectivity,” In Proceedings of the Sixth International World Wide Web Conference, 1997.[12]. J. Kleinberg, “Authoritative sources in a hyperlinked environment,” In ACM-SIAM Symposium on Discrete Algorithms, 1998.[13]. Chia-Hui Chang and Ching-Chi Hsu, “Enabling concept-based relevance feedback for information retrieval in the WWW,” IEEE Transactions on Knowledge and Data Engineering, 11(4):595-609, July/August 1999.[14]. Oren Zamir and Oren Etzioni, “Web document clustering: A feasibility demonstration,” In Proceedings of 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46-54, 1998.[15]. Chung-Hong Lee and Hsin-Chang Yang, “A Web Text Mining Approach Based on Self-Organizing Map,” In Proceedings of ACM WIDM Conferences, pp. 59-62, 1999.[16]. King-Ip Lin and Ravilumar Kondadadi, “A Similarity-Based Soft Clustering Algorithm for Documents,” Proceedings of the Seventh International Conference on Database Systems for Advanced Applications, April 2001.[17]. Chih-Hao Tsai, "A Review of Chinese Word Lists Accessible on the Internet," http://www.geocities.com/hao510/wordlist/, April 2001.[18]. R. Agrawal and R. Srikant, “Mining Sequential Patterns”, In Proceedings of 11th International Conference on Data Engineering, pp. 3-14, September 1995[19]. R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Very Large Databases,” Proc. ACM SIGMOD Conference Management of Data, pp. 207-216, 1993.[20]. Ming-Chuan Hung and Don-Lin Yang, “An Efficient Fuzzy C-Means Clustering Algorithm,” In Proceedings of IEEE Conference on Data Mining, pp.225-232, Nov. 2001.[21]. Yih-Jeng Lin and Ming-Shine Yu, “Extracting Chinese Frequent Strings Without a Dictionary From a Chinese Corpus and its Applications,” Journal of Information Science and Engineering, Vol. 17, pp. 805-824, September 2001.[22]. Ming-Syan Chen, Jiawei Han, and Philip S. Yu, “Data Mining: An Overview from a Database Perspective,” IEEE Transactions on Knowledge and Data Engineering, 1996.[23]. R.Agrawal, T. ImieLinski, and A. Swami, “Mining Association rules between sets of items in large databases,” In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207-216, May 1993.[24]. M. Kamber, J. Han, and J. Y. Chiang, “Metargule-guided mining of multidimensional association rules using data cubes,” In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 207-210, August 1997.[25]. R. Srikant and R. Agrawal, “Mining generalized association rules,” In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB), pp. 407-419, 1995.[26]. J Han and Y. Fu, “Discovery of Multiple-Level Association Rules from Large Databases,” In Proceedings of 21st International Conference on Very Large Data Base, pp. 420-431, September 1995.[27]. B. Lent, A. Swami, and J. Widom, “Clustering Association Rules,” In Proceedings of the 13th International Conference on Data Engineering (ICDE), pp. 220-231, April 1997.[28]. R. J. Miller and Y. Rang, “Association rules over interval data,” In Proceedings of the 1997 SIGMOD International Conference on Management of Data, pp. 452-461, May 1997.[29]. G. Grahne, L. Lakshmanan, and X. Wang, “Efficient mining of constrained correlated sets,” In Proceedings of the 16th International Conference on Data Engineering (ICDE), pp. 512-521, February 2000.[30]. J. Han, J. Pei, and Y. Yin, “Mining frequent Patterns without Candidate Generation,” In Proceedings of 2000 ACM SIGMOD International Conference on Management of Data, pp. 486-493, May 2000.[31]. George Chang, Marcus J. Healey, James A. M. McHugh, and Jason T. L. Wang, “Mining the World Wide Web — An Information Search Approach,” Kluwer Academic Publishers, ISBN 0-7923-7349-9, 2001.[32]. S. K. Murthy, “Automatic construction of decision trees from data: A multi-disciplinary survey,” Data Mining and Knowledge Discovery, 1998.[33]. D.L. Yang, J.H. Chang, M.C. Huang, and J.S. Liu, “An efficient K-means-based clustering algorithm,” In Proceedings of the 1st Asia-Pacific Conference on Intelligent Agent Technology, pp. 269-273, December, 1999.[34]. Sergey Brin and Lawerce Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” In Proceedings of Seventh International World-Wide Web Conference, April 14-18, 1998.[35]. Jeffery Dean and Monika R. Henzinger, “Finding related pages in the World Wide Web,” Elsevier Science B. V., 1999.[36]. S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg, “Automatic resource compilation by analyzing hyperlink structure and associated text,” In Proceedings of Seventh International World-Wide Web Conference, April 14-18, 1998.[37]. Krishna Bharat and Monika R. Henzinger, “Improved Algorithms for Topic Distillation in a Hyperlinked Environment,” In Proceedings of ACM SIGIR International Conference on Information Retrieval, 1998.[38]. Xiaofeng He, Chris H.Q. Ding, Gongyuan Zha, and Horst D. Simon, “Automatic Topic Identification Using Webpage Clustering,” In Proceedings of IEEE Conference on Data Mining, 2001.[39]. Scott Deerwester, Susan T. Dumais, and Richard Harshman, “Indexing by Latent Semantic Analysis,” Journal of the Society for Information Science, 41(6), 391-407.[40]. Charu C. Aggarwal and Philip S. Yu, “On Effective Conceptual Indexing and Similarity Search in Text Data,” In Proceedings of IEEE Conference on Data Mining, 2001.[41]. Steffen Staab, Rudi Studer, Hans-Peter Schnurr, and York Sure, “Knowledge Process and Ontologies,” IEEE Intelligent Systems, 2001.[42]. Joerg-Uwe Kietz, Alexander Maedche, and Raphael Volz, “A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet,” EKAW2000 12th International Conference on Knowledge Engineering and Knowledge Management, October 2, 2000.[43]. Jorg-Uwe Kietz, Raphael Volz, and Alexander Maedche, “Extracting a Domain-Specific Ontology from a Corporate Intranet,” In: Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, 2000.[44]. Alexander Maedche and Steffen Staab, “The TEXT-TO-ONTO Ontology Learning Environment,” The 8th International Conference on Conceptual Structures Logical, Linguistic, and Computational Issues, August 2000.[45]. C-C Kao, Y-H Kuo, J-P Hsu, and C-S Lee, "Personalized Information Classification System with Automatic Ontology Construction Capability," In Proceedings of The 11th Workshop on Object-Oriented Technology and Application, 2000.[46]. James Hendler, “Agents and the Semantic Web,” IEEE Intelligent Systems, 2001.
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔