研究生(外文):Li-Wei Lu
論文名稱(外文):Using Association Rules to Perform Document Retrieval in the Pre-Specified Doamin
指導教授(外文):Don-Lin Yang
外文關鍵詞:data miningnegative association rule miningdocument retrievalWorld Wide Webassociation rule mining
我們的方法主要結合了兩種資料探勘的技術,正關聯性法則與負關聯性法則。利用關聯性法則發掘出隱含在文件間字與字的關係,藉以找出有用的字集,並將這些關鍵字集來幫助使用者搜尋。再利用負關聯性法則來發掘出字與字間互斥之關係,藉由這些互斥的關係,來過濾掉無用的文件,讓使用者得到更精確的搜尋結果。使用者並可以藉由回饋的機制,來對於搜尋到之文件做評鑑。藉著此機制,將可讓我們的方法找出來之結果,更符合使用者之需求,讓使用者可以藉著這些資料來達到自我學習的目的。最後,我們實作了所提出方法的核心部份,並拿至期刊搜尋引擎如IEEE及ACM上所提供的搜尋功能及學術文章搜尋引擎Google Scholar及Scopus做驗證。結果証明了此方法能有效的協助使用者找出更有用的文件。
With the innovation of network and information technology, there are more and more information that we can obtain on the Internet. For all of us, there are lots of ways to obtain knowledge right now. We can get knowledge not just surveying books or learning on the class, but also from computers, multimedia, and the Internet. By using the search engine, the World Wide Web becomes a large database that contains rich resources. Because of the characteristics of the Internet, we can find desired materials for study any time any where. For this reason, the learning habits of people are gradually changing. Although numerous searching engines for Web searches, there still has a lot of space for improvement. In this paper, we propose a flexible searching method by using data mining techniques. We analyze the theses from the specialized domain for helping us to get research papers closer to our needs. Even though a search engine can help us find lots of information, it sometimes contains lots of redundant data. Our method can solve this problem.
Our method uses two kinds of data mining techniques called association rule mining and negative association rule mining. We use association rule mining to find out the relationships between words hidden in the articles for discovering the useful word sets in order to help the user in searching documents. We also use negative association rule mining to find out the exclusive relations between words for filtering some useless document to get accurate searching results. Users can be used to use the feedback mechanism to rate the searching results. This can improve our method and meet users’ needs. We have developed the core of proposed method and examined it on the specialized Web sites like IEEE, ACM, and Google Scholar. The result shows that our method really can help users find the useful documents as needed.
Table of Contents........................iv
List of Figures..........................vi
List of Tables...........................vii
Chapter 1 Introduction...................1
1.1 Motivation...........................1
1.2 Thesis Organization..................3
Chapter 2 Background and Related Work....4
2.1 Search Engine........................4
2.2 Data Mining..........................5
2.2.1 Association Rules..................6
2.2.2 Negative Association Rules.........10
Chapter 3 System Architecture............12
3.1 Crawler..............................14
3.2 Language Handler.....................16
3.3 Word Association Miner...............17
3.4 Database Connector...................19
3.5 Learning Assistant...................21
Chapter 4 System Working Process.........22
4.1 Data Preprocessing...................22
4.2 Keyword Mining .......................26
4.3 Information Searching................37
Chapter 5 Experimental Results...........39
5.1 Implementation .......................39
5.2 Data Preprocess......................39
5.3 Results and Discussion...............41
Chapter 6 Conclusions and Future Work....54
6.1 Conclusions..........................54
6.2 Future work..........................54
