跳到主要內容

臺灣博碩士論文加值系統

(35.153.100.128) 您好!臺灣時間:2022/01/19 03:20
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:王聖舜
研究生(外文):Sheng-Shun Wang
論文名稱:在大型資料庫中探勘容錯頻繁樣式
論文名稱(外文):Mining Fault-Tolerant Frequent Patterns in Large Databases
指導教授:李素瑛李素瑛引用關係
指導教授(外文):Suh-Yin Lee
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:46
中文關鍵詞:資料探勘容錯頻繁樣式
外文關鍵詞:Data miningFault tolerant frequent patternFT-containsupport
相關次數:
  • 被引用被引用:0
  • 點閱點閱:731
  • 評分評分:
  • 下載下載:185
  • 收藏至我的研究室書目清單書目收藏:2
有鑑於現實世界中的資料來源可能會因受到雜訊的干擾而導致資料的不正確。此外, 我們可能希望所找出來的資訊是較具普遍性。因此, FT-Aprori 用來解決在大型現實資料庫中容錯頻繁樣式(fault-tolerant frequent pattern)的探勘。然而,FT-Apriori是根據Apriori 性質採用候選樣式產生及測試(candidate generating-and-testing)方式,因此較無效率。
在本篇論文中,我們採用樣式成長(pattern growth)的方式發展較有效率的容錯頻繁樣式探勘演算法FTP-mine。首先我們考慮所有的資料都能儲存在記憶體的FTP-mine,我們設計一個STable用來計算具有相同字首樣式的item support 和 FT-support. 對於無法將資料儲存於記憶體的大型資料庫, FTP-mine 利用切割資料庫的方式來探勘容錯頻繁樣式. 此外,考慮到所找出來的容錯頻繁樣式可能會很多而且會有互相包含的情況,因此我們延伸FTP-mine 找出最大的容錯頻繁樣式。
在本論文中,我們針對不同的support、tolerance和scalability設計各種實驗,從結果可以看出FTP-mine在各種條件下都比FT-Apriori有較好的執行效率和linear scalability。

In view of real world data may be interfered with noise which leads data to contain faults. The data mining methods proposed previously may not be applicable. Besides, we may hope that the knowledge discovered is more general and can be applied to find more interesting information. Hence, FT-Aprori was proposed for fault-tolerant data mining to discover information over large real-world data. However, FT-Apriori which generates and tests candidates based on Apriori property is not so efficient.
In this paper, we develop memory-based algorithm FTP-mine which is based on the concept of pattern growth to mine fault-tolerant frequent patterns efficiently. In FTP-mine, the table, STable, is designed to count the item support and FT-support of the k-length patterns which have the same prefix of length k-1 by comparing transaction once. As to mining in a large database which is too large to fit in memory, FTP-mine also can be adopted by means of database partition. In addition, since there might exist a large number of fault tolerant frequent patterns and some may be contained in others, we also focus on the finding of maximal FT-frequent patterns by extending the FTP-mine algorithm.
Our study shows that FTP-mine has higher performance than FT-Apriori in all kinds of parameter settings, such as various supports, tolerance, and scalability. The empirical evaluations show that the proposed method has good linear scalability and outperforms FT-Apriori in the discovery of FT-frequent pattern.

Table of Contents
Abstract (in Chinese)…………………………………………………………………...i
Abstract (in English)…………………………………………………………………..ii
Acknowledgment……………………………………………………………………..iii
Table of Contents……………………………………………………………………..iv
List of Figures…………………………………………………………………………v
List of Tables………………………………………………………………………….vi
Chapter 1 Introduction……………………………………………………...…………1
1.1 Motivation and Research Goal……………………………………………….1
1.2 Overview of the Thesis……………………………………………………….3
Chapter 2 Background and Related works…………………………………………….4
2.1 Problem Definition……………………………………………………...........4
2.1.1 Frequent Pattern Mining………………………………………………4
2.1.2 Fault Tolerant Frequent Pattern Mining………………………………5
2.1.3 Frequent Pattern vs. FT-frequent pattern……………………………...6
2.2 FT-Apriori…………………………………………………………………….7
2.3 H-mine………………………………………………………………………..8
Chapter 3 Mining Frequent Pattern…………………………………………………..11
3.1 FTP-mine……………………………………………………………………11
3.2 Mining FT-frequent Patterns in Large Databases…………………………...28
3.3 Mining Maximal FT-frequent Patterns...……………………………………29
Chapter 4 Experimental Result and Analysis………………………………………...35
4.1 Experimental Environment………………………………………………….35
4.2 Experimental Result………………………………………………………...35
4.2.1 Minimum Item Support……………………………………………...36
4.2.2 Minimum Fault Tolerant Support……………………………………37
4.2.3 Fault Tolerance………………………………………………………37
4.2.4 Scalability of Transaction Database Size……………………………38
4.3 Performance Analysis and Discussion……………………………………...40
Chapter 5 Conclusion and Future Works……………………………………………..42
5.1 Conclusion…………………………………………………………………..42
5.2 Future Works……………………………………………………………..…42

[1] J. Pei, A. K. H. Tung, and J. Han, “Fault-Tolerant Frequent Pattern Mining: Problems and Challenges,” Proceedings of ACM-SIGMOD International Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'01), Santa Barbara, CA, May 2001,
[2] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, “H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases,” Proceedings of International Conference on Data Mining (ICDM'01)}, pp. 441-448, San Jose, CA, Nov. 2001.
[3] J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,” Proceedings of International Conference on Data Engineering (ICDE'01), pp. 215-226, Heidelberg, Germany, April 2001.
[4] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proceedings of the 11th International Conference on Data Engineering (ICDE’95), pp. 3-14 Taipei, Taiwan, 1999.
[5] R. Srikant and R. Agrawal, "Mining Sequential Patterns: Generalizations and Performance Improvements," 5th International Conference on Extending Databases Technology, pp. 3-17, 1996.
[6] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,'' Proceedings of ACM-SIGMOD International Conference on Management of Data (SIGMOD'00), pp. 1-12, Dallas, TX, May 2000.
[7] R. Srikant, and R. Agrawal, “Mining Generalized Association Rules,” Proceedings of the 21th VLDB Conference Zurich, Switzerland, pp. 407-419, 1995.
[8] M. Y. Lin, S. Y. Lee, and S. S. Wang, “DELISP: Efficient Discovery of Generalized Sequential Patterns by Delimited Pattern-Growth Technology,” 6th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’02), pp.198-209, Taipei, Taiwan, May, 2002.
[9] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th VLDB Conference, pp. 487-499, Santiago, Chile, 1994.
[10] H. Mannila, H. Toivonen, and A. I. Verkamo, “Efficient Algorithms for Discovering Association Rules,” KDD-94: AAAI Workshop on Knowledge Discovery in Databases, pp.181-192, Seattle, Washington, July 1994.
[11] R. J. Bayardo Jr., “Efficiently Mining Long Patterns from Databases,” Proceedings of ACM SIGMOD, pp. 85-93, 1998.
[12] J. Pei, J. Han, and R. Mao, “CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,” Proceedings of ACM-SIGMOD International Workshop on Data Mining and Knowledge Discovery (DMKD'00), pp. 11-20, Dallas, TX, May 2000.
[13] Mohammed J. Zaki, “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” Journal of Machine Learning, special issue on Unsupervised Learning, Vol. 42 Nos. 1/2, pp 31-60, Jan/Feb 2001.
[14] Karam Gouda, Mohammed J. Zaki, “Efficiently Mining Maximal Frequent Itemsets,” Proceedings of IEEE International Conference on Data Mining (ICDM’01), pp. 163-170, San Jose, November 2001.
[15] M. J. Zaki, “Generating Non-Redundant Association Rules,” Proceedings of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 34-43, Boston, MA, August 2000.
[16] M. J. Zaki, Ching-Jui Hsiao, “CHARM: An Efficient Algorithm for Closed Itemset Mining,” Proceedings of 2nd SIAM International Conference on Data Mining, Arlington, April 2002.
[17] Y. Bastide, R. Taouil, N. Pasquier, G. Stumme and L. Lakhal, “Mining Frequent Patterns with Counting Inference,” SIGKDD Explorations, Vol. 2 , Issue 2, pp. 66-75, Dec. 2000.
[18] R. C. Agarwal, C. C. Aggarwal, and V. V. V. Prasad, “Depth First Generation of Long Patterns,” Proceedings of International Conference on Knowledge Discovery & Data Mining, pp. 108-118, Boston, USA, August 2000.
[19] R. C. Agarwal, C. C. Aggarwal, and V. V. V. Prasad, “A Tree Projection Algorithm for Generation of Frequent Itemsets,” Journal of Parallel and Distributed Computing, 2001, Vol. 61, No.3, pp. 350-371.
[20] W. Wang, J. Yang, and P. S. Yu, “Mining Patterns in Long Sequential Data with Noise,” SIGKDD Explorations, Vol. 2, No. 2, Dec. 2000, pp. 28-33.
[21] M. S. Chen, Jiawei Han and Philip S. Yu, “Data mining: an overview from a database perspective,” IEEE Transaction on Knowledge and Data Engineering, Vol. 8, pp. 866-883, Dec. 1996.
[22] IBM inc. http://www.almaden.ibm.com/cs/quest/syndata.html

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top