跳到主要內容

臺灣博碩士論文加值系統

(44.220.44.148) 您好!臺灣時間:2024/06/14 12:13
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:劉宸諭
研究生(外文):Chen-Yu Liu
論文名稱:應用AprioriSome演算法於模糊序列樣式探勘
論文名稱(外文):Application of AprioriSome Algorithm For Fuzzy Sequential Patterns Mining
指導教授:郭人介郭人介引用關係
指導教授(外文):Ren-Jie Kuo
口試委員:邱垂昱趙莊敏駱至中
口試委員(外文):Chuei-Yu ChiuChung-Min ChaoChih-Chung Lo
口試日期:2006-06-05
學位類別:碩士
校院名稱:國立臺北科技大學
系所名稱:工業工程與管理系所
學門:工程學門
學類:工業工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:英文
論文頁數:43
中文關鍵詞:AprioriAll 演算法AprioriSome 演算法資料探勘模糊序列樣式k-means 演算法
外文關鍵詞:AprioriAll algorithmAprioriSome algorithmData miningFuzzy Sequential patternsk-means
相關次數:
  • 被引用被引用:0
  • 點閱點閱:479
  • 評分評分:
  • 下載下載:5
  • 收藏至我的研究室書目清單書目收藏:0
資料探勘目的為提供有價值的資訊給予決策者。依據使用者的需求可分為六大類:叢集分析,分類分析,線性迴歸,時間序列,關聯法則,序列樣式。
自從1995,Agrwal和Srikant兩位學者提出序列樣式探勘後,便有許多學者致力於改善演算法的效率及降低演算時間。如 GSP、SPADE、PrfixSpan…等。然而有些研究著重於實際交易資料中交易數量對序列樣式的影響。在過去的文獻中,已經成功的將模糊化的概念運用在序列樣式探勘。但當資料庫太大時,其所使用的AprioriAll演算法有著冗長的演算速度,而如何找出合適的模糊數,也是過去研究中較少提及的部份。
為了改善處理時間和找出適當的模糊數,本研究運用k-means演算法先找出合適的模糊數,接著使用AprioriSome演算法來找出模糊序列樣式。並且實際操作某家證卷公司的交易資料及Microsoft SQL Server 2000中的一個例子來證明當資料庫其優越性。
Data mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. Its purpose is to provide valuable information to the decision maker. Data mining has category of six classes by different user requirements: classification analysis, clustering analysis, linear regression, time series, association rule, and sequential patterns.
Since 1995, when sequential pattern mining was proposed by Agrwal and Srikant, there have been many scholars working to improve the efficiency and reduce processing time of algorithm. For instant GSP, SPADE, PrfixSpan, etc. Moreover, some studies have concerned about transaction data in real-world applications that they are usually consisted of quantitative values. Hong et al. integrated the concepts of fuzzy sets and the AprioriAll algorithm to find interesting sequential patterns and fuzzy association rules from transaction data. But AprioriAll algorithm will generate all candidate sets so that it is not efficient in mining large sequence databases with numerous patterns, and thus is time-consuming. The problem of how to define the appropriate fuzzy set is also discussed in many articles.
In order to reduce the processing time of fuzzy sequential patterns and derive the membership functions automatically, AprioriSome algorithm for fuzzy sequential patterns mining is proposed and taking two experiments, transaction data provided by a securities firm and foodmarket data from SQL sever 2000, demonstrates the strength of Fuzzy AprioriSome Sequential Pattern Mining when the transaction data are large.
ABSTRACT ii
摘 要 iv
ACKNOWLEDGMENTS v
LIST OF FIGURES viii
LIST OF TABLES x
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Objectives 2
1.3 The Scope and Requirement 2
1.4 Research Structure 2
Chapter 2 Literature Review 5
2.1 Data Mining 5
2.2 Sequential Patterns 8
2.2.1 Priori-based Method 10
2.2.2 Pattern-growth Method 14
2.3 Fuzzy Set 15
2.4 Fuzzy Sequential Patterns 16
Chapter 3 Methodology 18
3.1 Framework 18
3.2 Definition of Fuzzy Membership Functions 19
3.2.1 Clustering Techniques (k-Means) 20
3.2.2 Determine the Member Function 21
3.3 Fuzzy Sequential Patterns Mining with AprioriSome Algorithm 22
3.4 Verification 25
Chapter 4 Experimental Results 26
4.1 Data Sources 26
4.2 Case 1: Securities Firm 26
4.2.1 K-means for Finding Fuzzy Set 28
4.2.2 Sequential Patterns of Case 1 30
4.3 Case 2: Foodmart Data 33
4.3.1 K-means for Finding Fuzzy Set 34
4.3.2 Sequential Patterns of Case 2 35
4.4 Experiment Environment and Parameter Set 35
4.5 Experimental Result and Analysis 36
Chapter 5 Conclusions 39
5.1 Conclusions 39
5.2 Contributions 39
5.3 Suggestions 40
References 41










LIST OF FIGURES

Figure 1.1 Procedure of this study .............................................................4
Figure 2.1 AprioriAll algorithm................................................................11
Figure 2.2 AprioriSome algorithm ............................................................12
Figure 2.3 Model for generating fuzzy sequential patterns .........................17
Figure 3.1 Architecture of the proposed algorithm ……….......................19
Figure 3.2 k-Means algorithm....................................................................20
Figure 3.3 Membership function...............................................................21
Figure 3.4 Next function ..........................................................................24
Figure 4.1 Data set used in case 1………………………...…………………27
Figure 4.2 Data group by customer ID (case 1)………………….…….……28
Figure 4.3 Clustering of case 1………………………………………………29
Figure 4.4 Membership function (case 1)…………………………...……….30
Figure 4.5 Fuzzy AprioriSome program…………………………………….31
Figure 4.6 Data group by customer ID (case 2)……………………………..34
Figure 4.7 Membership function (case 2)……………………………………34
Figure 4.8 Performance of the two algorithms in case1……………..………36
Figure 4.9 Performance of the two algorithms in case2……………………..37
Figure 4.10 Performance with different size…………………………….…..38


LIST OF TABLES

Table 4.1 Stock type…………………………………………………….……26
Table 4.2 Fuzzy sets transformed from the data in Fig 4.3……………..……29
Table 4.3 Parameter of membership function……………………………………...30
Table 4.4 Large 1-itemsets mapped to contiguous integers………………….31
Table 4.5 Sequential patterns of case 1……………..…………….………….32
Table 4.6 Different fuzzy support of sequential patterns…………………….32
Table 4.7 Item type …………………………………………………….……33
Table 4.8 Fuzzy sets transformed from the data in Fig 4.5……………….….34
Table 4.9 Sequential patterns of case 2……………..…………….………….34
[1] A.W.C. Fu, “Finding Fuzzy Sets for the Mining of Association Rules for Numerical Attributes,” Proceedings of IDEAL, 1998, pp. 263 - 268.
[2] A. Gyenesi, “Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems,” Proceedings of Advance in Fuzzy Systems and Evol. Comp., 2001, pp. 48 - 53.
[3] D. Jiang, C. Tang and A. Zhang, “Cluster Analysis for Gene Expression Data: A Survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 11, 2004, pp. 1370 - 1386.
[4] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal and M.C. Hsu,
“FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining,” Proceedings 2000 ACM SIGKDD International Conference Knowledge Discovery in Databases (KDD ’00), 2000, pp. 355 - 359.
[5] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M.C. Hsu, “Prefixspan: Mining sequential patterns by prefix-projected pattern growth,” Proceedings of the 17th International Conference on Data Engineering, 2001, pp. 215 - 224.
[6] K. J. Cios, W. Pedrycz and R. W. Syiniarski, DATA MINING METHOSD FOR KNOWLEDGE DISCOVERY, United States of America:Kluwer Academic Publishers.
[7] K. Koutroumbas, PATTERN RECOGNITION/ SERGIOS THODORIDIS KONSTANTIONS KOUTROUMBAS, San Diego, CA, USA: Academic press.
[8] M. Zaki, “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” Machine Learning, vol. 40, 2001, pp. 31 - 60.
[9] M. Garofalakis, R. Rastogi and K. Shim, “Mining Sequential Patterns with Regular Expression Constraints,” IEEE Transactions on Knowledge and Data Engineering, vol. 14(3), 2002, pp. 530 - 552.
[10] M. Kaya and R. Alhaji, “Multi-objective Genetic Algorithm Based Approach for Optimizing Fuzzy Sequential Patterns” Tools with Artificial Intelligence (ICTAI), 16th IEEE International Conference, 2004, pp. 396 - 400.
[11] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings 1994 International Conference Very Large Data Bases (VLDB ’94), 1994, pp. 487 - 499.
[12] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proceedings 1995 International Conference Data Eng. (ICDE ’95), 1995, pp. 3 - 14.
[13] R. Agrawal and R. Srikant, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proceedings 5th International Conference Extending Database Technology (EDBT), 1996, pp. 3 - 17.
[14] R.S. Chen, G.H. Tzeng, C.C. Chen and Y.C. Hu, “Discovery of Fuzzy Sequential Patterns for Fuzzy Partitions in Quantitative Attributes,” Computer Systems and Applications ACS/IEEE International Conference, 2001, pp. 144 -150.
[15] T.P. Hong, C.S. Kuo and S.C. Chi, “Mining Fuzzy Sequential Patterns from Quantitative Data,” Systems, Man and Cybernetics IEEE SMC ‘99 Conference Proceedings 1999 IEEE International Conference on Volume 3, 1999, pp. 962 - 966.
[16] T.P. Hong, K.Y. Lin and S.L. Wang, “Mining Fuzzy Sequential Patterns from Multiple-Item Transactions,” IFSA World Congress and 20th NAFIPS International Conference, Joint 9th Volume 3, 2001, pp. 1317 - 1321.
[17] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy, “Advances in Knowledge Discovery and Data Mining,” AAAI/MIT Press, 1996.
[18] W. J. Frawley, G. Piatetsky-Shapiro and C. J. Matheus, “Knowledge Discovery in Database: An Overview,” AAAI/MIT Press, 1991.
[19] 林桂英,從數值型資料挖掘模糊多層級關連規則及模糊順序性樣式,碩士,義守大學資訊工程學系,高雄,2000。
[20]張玉玫,個別投資者重複購買股票行為之研究,碩士,私立輔仁大學金融研究所,台北,2004。
[21]楊俊能,以序列樣式探勘技術預測線上顧客次時點消費狀態之研究,碩士,國立台北科技大學生產系統工程與管理研究所,台北,2004。
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top