(54.236.62.49) 您好!臺灣時間:2021/03/06 10:44
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳煜儒
研究生(外文):Yu-Ru Chen
論文名稱:利用資料探勘建構一個有彈性的網路搜尋系統
論文名稱(外文):Using Data Mining to Construct a Flexible Web Searching System
指導教授:楊東麟楊東麟引用關係
指導教授(外文):Don-Lin Yang
學位類別:碩士
校院名稱:逢甲大學
系所名稱:資訊工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:71
中文關鍵詞:本體論網路探勘搜尋引擎資料探勘全球資訊網資訊檢索
外文關鍵詞:ontologyweb mininginformation extractionWorld Wide Webinformation retrievaldata mining
相關次數:
  • 被引用被引用:3
  • 點閱點閱:250
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:52
  • 收藏至我的研究室書目清單書目收藏:12
隨著大量線上資訊時代的來臨,對網路搜尋而言,全球資訊網變成了一個豐碩的領地。雖然現今已發展出許多種網路搜尋技術,但是這些技術還是有許多的缺點需要改進。在本論文中我們運用資料探勘的技術提出了一個新的網頁評比的演算法和一個智慧型的網路搜尋系統來進行更有彈性且更有效的搜尋、分析網路上的文件。我們的方法利用網路文件既有的特性更有意義的發掘、找尋、並評估這些資料。我們有智慧的利用網頁間的超鏈結結構並結合網頁的內容去評比這些搜尋出來的文件。它可以彌補現今評估方法對於多框架網頁以及非相關超鏈結網頁所會出現的評估缺點。此外還運用關於特定領域的知識本體架構來幫助搜尋並且評比網頁更具語意性。而且在搜尋系統中提供了更友善且易於操作的使用者介面。
我們的系統在線上提供了兩種資料探勘的技術,關聯法則和組群化,讓使用者方便的探索並且瀏覽這些搜尋出來的文件。利用關聯法則發掘出關鍵字集合可以用來代表這些搜尋出來的文件所隱含的主要特性,並且這些關鍵字集可以作為給使用者做更進一步搜尋的建議。組群化是用來將搜尋出來的文件做分組,將它們分成幾個不同類型的群組,讓使用者更容易的處理這些搜尋出來的大量文件。最後,我們實作了所提出的系統並且使用逢甲大學的網頁資料來測試,其中包括中文以及英文的文件。結果顯示它確實是一個有彈性而且有效的網路搜尋系統。
With the huge amount of information available on the World Wide Web, Web servers provide a fertile ground for information searches. Although numerous searching technologies have been developed for Web searches, there still has a lot of space for improvement. In this thesis we present a new ranking algorithm and an intelligent Web searching system using data mining techniques to search and analyze Web documents in a more flexible and effective way. Our method employs the characteristics of Web documents to extract, find, and rank data in a meaningful manner. We utilize hyperlink structures along with the content of Web documents intelligently to rank the retrieved results. It can solve the ranking problems of existing algorithms such as multi-frame Web documents and unrelated linking documents. In addition, we use domain specific ontologies to improve our query process and to rank retrieved Web documents with better semantic notion. We also provide more friendly and easy to use interfaces.
Two data mining techniques, association rules mining and clustering are employed online for users to explore and browse the retrieved documents conveniently. We use association rules mining to find the patterns of maximal keyword sets, which represent the main characteristics of the retrieved documents. For subsequent queries, these keywords become recommended sets of query terms for the specific users. Clustering is used to group retrieved documents into distinct clusters that can help users make their decisions easier. We have developed the proposed system and tested it with both Chinese and English Web documents from the Web site of Feng Chia University. The result shows that it is indeed a flexible and effective Web searching system.
摘要1
Abstract2
Table of Contents3
List of Figures6
List of Tables7
Chapter 1 8
Introduction8
Chapter 2 12
Background and Related Works12
2.1 Web Search Engine12
2.2 Data Mining16
2.2.1 Association rules mining17
2.2.2 Sequential pattern mining19
2.2.3 Classification and prediction20
2.2.4 Clustering23
Chapter 3 25
System Architecture25
3.1 Crawler27
3.2 Language Processor28
3.3 Databases30
Chapter 4 32
Searching Process and Ranking Algorithm32
4.1 Searching Process32
4.2 Ontology33
4.3 Our Ranking Algorithm37
4.3.1 Model for Web documents37
4.3.2 Linkage expansion38
4.3.3 Ranking Algorithm40
4.3.4 Ranking result analysis44
Chapter 5 46
Data Miner46
5.1 Chinese Phrases Extraction46
5.2 Keyword Association48
5.3 Document Clustering49
Chapter 6 52
Implementation and Experiments52
6.1 Implementation52
6.2 Experiment Results57
Chapter 7 59
Conclusions and Future Work59
7.1 Summary59
7.2 Future Work60
References62
Acknowledgement68
Vita69
[1]. http://www.google.com/[2]. http://www.yahoo.com/[3]. http://www.MetaCrawler.com/[4]. http://www.altavista.com/[5]. S. Lawrence and C.L. Ciles, “Searching the World Wide Web,” Science, 280(5360):98-100, April 1998.[6]. S. Chakrabarti, M. van den Berg, and B. Dom, “Focused crawling: A new approach to topic-specific Web resource discovery,” In Proceedings of the 8th International World Wide Web Conference, pp. 1623-1640, May 1999.[7]. J. Cho, H. Garcia-Molina, and L. Page, “Efficient crawling through URL ordering,” In Proceedings of the 7th international World Wide Web Conference, pp. 161-172, April 1998.[8]. S.-H. Teng, Q. Lu, M. Eichstaedt, D. Ford, and T. Lehman, “Collaborative Web crawling: Information gathering/processing over Internet,” In Proceedings of the 32nd Hawaii International Conference on System Science, pp. 1-12, 1999.[9]. W. B. Frakes and R. Baeza-Yates, “Information retrieval: Data structures and algorithms,” Prentice-Hall, 1992.[10]. Soumen Chakrabarti, “Data mining for hypertext: A tutorial survey,” ACM SIGKDD Explorations, Jan 2000.[11]. Carriere, J., Kazman, R., “WebQuery: searching and visualizing the Web through connectivity,” In Proceedings of the Sixth International World Wide Web Conference, 1997.[12]. J. Kleinberg, “Authoritative sources in a hyperlinked environment,” In ACM-SIAM Symposium on Discrete Algorithms, 1998.[13]. Chia-Hui Chang and Ching-Chi Hsu, “Enabling concept-based relevance feedback for information retrieval in the WWW,” IEEE Transactions on Knowledge and Data Engineering, 11(4):595-609, July/August 1999.[14]. Oren Zamir and Oren Etzioni, “Web document clustering: A feasibility demonstration,” In Proceedings of 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46-54, 1998.[15]. Chung-Hong Lee and Hsin-Chang Yang, “A Web Text Mining Approach Based on Self-Organizing Map,” In Proceedings of ACM WIDM Conferences, pp. 59-62, 1999.[16]. King-Ip Lin and Ravilumar Kondadadi, “A Similarity-Based Soft Clustering Algorithm for Documents,” Proceedings of the Seventh International Conference on Database Systems for Advanced Applications, April 2001.[17]. Chih-Hao Tsai, "A Review of Chinese Word Lists Accessible on the Internet," http://www.geocities.com/hao510/wordlist/, April 2001.[18]. R. Agrawal and R. Srikant, “Mining Sequential Patterns”, In Proceedings of 11th International Conference on Data Engineering, pp. 3-14, September 1995[19]. R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Very Large Databases,” Proc. ACM SIGMOD Conference Management of Data, pp. 207-216, 1993.[20]. Ming-Chuan Hung and Don-Lin Yang, “An Efficient Fuzzy C-Means Clustering Algorithm,” In Proceedings of IEEE Conference on Data Mining, pp.225-232, Nov. 2001.[21]. Yih-Jeng Lin and Ming-Shine Yu, “Extracting Chinese Frequent Strings Without a Dictionary From a Chinese Corpus and its Applications,” Journal of Information Science and Engineering, Vol. 17, pp. 805-824, September 2001.[22]. Ming-Syan Chen, Jiawei Han, and Philip S. Yu, “Data Mining: An Overview from a Database Perspective,” IEEE Transactions on Knowledge and Data Engineering, 1996.[23]. R.Agrawal, T. ImieLinski, and A. Swami, “Mining Association rules between sets of items in large databases,” In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207-216, May 1993.[24]. M. Kamber, J. Han, and J. Y. Chiang, “Metargule-guided mining of multidimensional association rules using data cubes,” In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 207-210, August 1997.[25]. R. Srikant and R. Agrawal, “Mining generalized association rules,” In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB), pp. 407-419, 1995.[26]. J Han and Y. Fu, “Discovery of Multiple-Level Association Rules from Large Databases,” In Proceedings of 21st International Conference on Very Large Data Base, pp. 420-431, September 1995.[27]. B. Lent, A. Swami, and J. Widom, “Clustering Association Rules,” In Proceedings of the 13th International Conference on Data Engineering (ICDE), pp. 220-231, April 1997.[28]. R. J. Miller and Y. Rang, “Association rules over interval data,” In Proceedings of the 1997 SIGMOD International Conference on Management of Data, pp. 452-461, May 1997.[29]. G. Grahne, L. Lakshmanan, and X. Wang, “Efficient mining of constrained correlated sets,” In Proceedings of the 16th International Conference on Data Engineering (ICDE), pp. 512-521, February 2000.[30]. J. Han, J. Pei, and Y. Yin, “Mining frequent Patterns without Candidate Generation,” In Proceedings of 2000 ACM SIGMOD International Conference on Management of Data, pp. 486-493, May 2000.[31]. George Chang, Marcus J. Healey, James A. M. McHugh, and Jason T. L. Wang, “Mining the World Wide Web — An Information Search Approach,” Kluwer Academic Publishers, ISBN 0-7923-7349-9, 2001.[32]. S. K. Murthy, “Automatic construction of decision trees from data: A multi-disciplinary survey,” Data Mining and Knowledge Discovery, 1998.[33]. D.L. Yang, J.H. Chang, M.C. Huang, and J.S. Liu, “An efficient K-means-based clustering algorithm,” In Proceedings of the 1st Asia-Pacific Conference on Intelligent Agent Technology, pp. 269-273, December, 1999.[34]. Sergey Brin and Lawerce Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” In Proceedings of Seventh International World-Wide Web Conference, April 14-18, 1998.[35]. Jeffery Dean and Monika R. Henzinger, “Finding related pages in the World Wide Web,” Elsevier Science B. V., 1999.[36]. S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg, “Automatic resource compilation by analyzing hyperlink structure and associated text,” In Proceedings of Seventh International World-Wide Web Conference, April 14-18, 1998.[37]. Krishna Bharat and Monika R. Henzinger, “Improved Algorithms for Topic Distillation in a Hyperlinked Environment,” In Proceedings of ACM SIGIR International Conference on Information Retrieval, 1998.[38]. Xiaofeng He, Chris H.Q. Ding, Gongyuan Zha, and Horst D. Simon, “Automatic Topic Identification Using Webpage Clustering,” In Proceedings of IEEE Conference on Data Mining, 2001.[39]. Scott Deerwester, Susan T. Dumais, and Richard Harshman, “Indexing by Latent Semantic Analysis,” Journal of the Society for Information Science, 41(6), 391-407.[40]. Charu C. Aggarwal and Philip S. Yu, “On Effective Conceptual Indexing and Similarity Search in Text Data,” In Proceedings of IEEE Conference on Data Mining, 2001.[41]. Steffen Staab, Rudi Studer, Hans-Peter Schnurr, and York Sure, “Knowledge Process and Ontologies,” IEEE Intelligent Systems, 2001.[42]. Joerg-Uwe Kietz, Alexander Maedche, and Raphael Volz, “A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet,” EKAW2000 12th International Conference on Knowledge Engineering and Knowledge Management, October 2, 2000.[43]. Jorg-Uwe Kietz, Raphael Volz, and Alexander Maedche, “Extracting a Domain-Specific Ontology from a Corporate Intranet,” In: Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, 2000.[44]. Alexander Maedche and Steffen Staab, “The TEXT-TO-ONTO Ontology Learning Environment,” The 8th International Conference on Conceptual Structures Logical, Linguistic, and Computational Issues, August 2000.[45]. C-C Kao, Y-H Kuo, J-P Hsu, and C-S Lee, "Personalized Information Classification System with Automatic Ontology Construction Capability," In Proceedings of The 11th Workshop on Object-Oriented Technology and Application, 2000.[46]. James Hendler, “Agents and the Semantic Web,” IEEE Intelligent Systems, 2001.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊
 
系統版面圖檔 系統版面圖檔