研究生(外文):Yu-Ru Chen
論文名稱(外文):Using Data Mining to Construct a Flexible Web Searching System
指導教授(外文):Don-Lin Yang
外文關鍵詞:ontologyweb mininginformation extractionWorld Wide Webinformation retrievaldata mining
With the huge amount of information available on the World Wide Web, Web servers provide a fertile ground for information searches. Although numerous searching technologies have been developed for Web searches, there still has a lot of space for improvement. In this thesis we present a new ranking algorithm and an intelligent Web searching system using data mining techniques to search and analyze Web documents in a more flexible and effective way. Our method employs the characteristics of Web documents to extract, find, and rank data in a meaningful manner. We utilize hyperlink structures along with the content of Web documents intelligently to rank the retrieved results. It can solve the ranking problems of existing algorithms such as multi-frame Web documents and unrelated linking documents. In addition, we use domain specific ontologies to improve our query process and to rank retrieved Web documents with better semantic notion. We also provide more friendly and easy to use interfaces.
Two data mining techniques, association rules mining and clustering are employed online for users to explore and browse the retrieved documents conveniently. We use association rules mining to find the patterns of maximal keyword sets, which represent the main characteristics of the retrieved documents. For subsequent queries, these keywords become recommended sets of query terms for the specific users. Clustering is used to group retrieved documents into distinct clusters that can help users make their decisions easier. We have developed the proposed system and tested it with both Chinese and English Web documents from the Web site of Feng Chia University. The result shows that it is indeed a flexible and effective Web searching system.
Table of Contents3
List of Figures6
List of Tables7
Chapter 1 8
Chapter 2 12
Background and Related Works12
2.1 Web Search Engine12
2.2 Data Mining16
2.2.1 Association rules mining17
2.2.2 Sequential pattern mining19
2.2.3 Classification and prediction20
2.2.4 Clustering23
Chapter 3 25
System Architecture25
3.1 Crawler27
3.2 Language Processor28
3.3 Databases30
Chapter 4 32
Searching Process and Ranking Algorithm32
4.1 Searching Process32
4.2 Ontology33
4.3 Our Ranking Algorithm37
4.3.1 Model for Web documents37
4.3.2 Linkage expansion38
4.3.3 Ranking Algorithm40
4.3.4 Ranking result analysis44
Chapter 5 46
Data Miner46
5.1 Chinese Phrases Extraction46
5.2 Keyword Association48
5.3 Document Clustering49
Chapter 6 52
Implementation and Experiments52
6.1 Implementation52
6.2 Experiment Results57
Chapter 7 59
Conclusions and Future Work59
7.1 Summary59
7.2 Future Work60
