跳到主要內容

臺灣博碩士論文加值系統

(44.222.131.239) 您好!臺灣時間:2024/09/09 20:22
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:胡雅涵
研究生(外文):Ya-Han Hu
論文名稱:使用以限制為基礎的序列規則方法的顧客購買行為研究
論文名稱(外文):The Research of Customer Purchase Behavior Using Constraint-based Sequential Pattern Mining Approach
指導教授:陳彥良陳彥良引用關係
學位類別:博士
校院名稱:國立中央大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:英文
論文頁數:73
中文關鍵詞:序列資料以限制為基礎的資料挖掘方法時間序列資料庫
外文關鍵詞:Constraint-based miningtemporal databaseSequential pattern
相關次數:
  • 被引用被引用:2
  • 點閱點閱:638
  • 評分評分:
  • 下載下載:108
  • 收藏至我的研究室書目清單書目收藏:5
序列資料挖掘是一種在資料挖掘領域中非常重要的一種方法,其目標是從序列資料庫中,找出與時間相關的行為樣式。近幾年來,用序列資料挖掘方法來找出有用的資訊已被應用到各種不同的應用領域,例如:行銷決策、醫療紀錄分析、銷售分析等。過去大多數的序列資料挖掘方法都只注重在序列樣式頻率上的探討,主要的原因在於過去在做序列資料分析均假設序列資料並不會隨著時間而有所變動。然而,在現實生活上企業銷售的資料卻是具有高度的變動性與複雜性的,所以這導致了序列行為會經常隨著時間而有所變動。針對這個問題,我們在本文中將之分為兩個子問題:「企業對企業(B2B)環境下的序列資料挖掘」與「企業對顧客(B2C)環境下的序列資料挖掘」,而分為這兩個子問題來做後續探討的主要原因在於其序列資料具有各自的特色。緊接著我們介紹三種新的概念:考量新穎性(Recency)、考量重覆性(Repetition)、與考量簡潔性(Compactness)。新穎性的概念在於讓所產生的序列樣式可以考量到最近發生的行為,重覆性的概念可以確保序列樣式在一個序列中最少出現的次數必須滿足使用者的要求,而簡潔性的概念則確保一個序列樣式是在使用者自訂的一個時間區間下所發生。在本文中我們針對兩種不同的環境,運用了上述的三種概念來定義了兩種獨特的序列樣式,同時並發展出兩套有效率的演算法。我們也進行非常完整的實驗評估,結果顯示本文所提出的兩種演算法不但非常的有效率,且當序列資料在高度變動下,相對於傳統方法我們可以找出更有趣的序列樣式。
Sequential pattern mining is an important data-mining method for determining time-related behavior in sequence databases. The information obtained from sequential pattern mining can be used in marketing, medical records, sales analysis, and so on. Existing methods only focus on the concept of frequency because of the assumption that sequences’ behaviors do not change over time. Business sales environments are always highly dynamic and complicated, however, so the sequences’ behaviors may change over time. In this study, we first divide this problem into two sub-problems: sequential pattern mining in business-to-business (B2B) environment and business-to-customer (B2C) environment due to their unique sequence characteristics. Then, three new concepts, recency, repetition, and compactness, are incorporated into traditional sequential pattern mining to discover meaningful patterns in these two environments. The concept of recency causes patterns to quickly adapt to the latest behaviors in sequence databases. The concept of repetition ensures the occurrences of a pattern in a data-sequence must exceed user-specified thresholds. The concept of compactness ensures reasonable time spans for the discovered patterns. Two new patterns as well as efficient algorithms are presented in this dissertation. Thorough empirical evaluations are also given. The results show that the proposed methods are computationally efficient and they are more advantageous than traditional methods when sequences’ behaviors change over time.
Table of Contents i
List of Illustrations iii
List of Tables iv
Chapter 1. Introduction 1
1.1. Motivations and Research Objectives 2
1.2. Considering Time Constraints on Sequential Pattern Mining in B2C Environment 3
1.3. Considering Time Constraints on Sequential Pattern Mining in B2B Environment 5
1.4. Organization of the Dissertation 7
Chapter 2. Literature Review 8
2.1. Sequential Pattern Mining: An Overview 8
2.2.1. Improve the Efficiency in Sequential Pattern Mining Process: 9
2.2.2. Extend the Mining of Sequential Pattern to Other Time-Related Patterns 13
2.2. Data Mining in a Changing Environment 18
2.3. Constraint-based Sequential Pattern Mining 19
2.4. Discussion 20
Chapter 3. The Problem of Sequential Pattern Mining in B2C Environment 22
3.1. Problem Definition 22
3.2. The CFR-PostfixSpan Algorithm 25
3.3. Experimental Study 31
3.3.1. Data 31
3.3.2. Performance Measures 33
3.3.3. Experimental Setup 34
3.4. Results and Discussions 35
3.5. Summary 40
Chapter 4. The Problem of Sequential Pattern Mining in B2B Environment 42
4.1. Problem Definition 42
4.2. Algorithm 46
4.2.1. The CFR2-apriori Algorithm 46
4.2.2. The Support Counting Process 51
4.3. Performance Evaluation 57
4.3.1. Synthetic Data Generation and real-life data 57
4.3.2. Performance Evaluation 60
4.4. Summary 65
Chapter 5. Conclusions and Future Research 67
References 69
Appendix 73
[1]R. Agrawal, C. Faloutsos, and A. Swami, “Efficient similarity search in sequence databases”, Proceedings of Conference on Foundations of Data Organization and Algorithms, pp. 69-84, 1993.
[2]M. Last, Y. Klein, and A. Kandel, “Knowledge Discovery in Time Series Databases”, IEEE transactions on systems, man, and cybernetics, Vol. 31, No. 1, pp. 160-168, 2001.
[3]B. LeBaron and A. S. Weigend, “A Bootstrap Evaluation of the Effect of Data Splitting on Financial Time Series”, IEEE Transactions on Neural Networks, Vol. 9, No. 1, pp. 213-220, 1998.
[4]C. Y. Chang, M. S. Chen, and C. H. Lee, “Mining general temporal association rules for items with different exhibition periods”, IEEE International Conference on Data Mining, pp. 59-66, 2002.
[5]C. H. Lee, M. S. Chen, and C. R. Lin, “Progressive partition miner: an efficient algorithm for mining general temporal association rules”, IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 4, pp. 1004-1017, 2003.
[6]Y. Li, P. Ning, X. S. Wang, and S. Jajodia, “Discovering calendar-based temporal association rules”, Data & Knowledge Engineering, Vol. 44, No. 2, pp. 193-218, 2003.
[7]R. Agrawal and R. Srikant, “Mining sequential patterns”, Proceedings of 1995 International Conference Data Engineering, pp. 3-14, 1995.
[8]R. Srikant and R. Agrawal, “Mining sequential patterns: generalizations and performance improvements”, Proceedings of the 5th International Conference on Extending Database Technology, pp. 3-17, Avignon, France, 1996.
[9]X. Yan, J. Han, and R. Afshar, “CloSpan: Mining Closed Sequential Patterns in Large Datasets”, Proceedings of the 2003 SIAM International Conference on Data Mining (SDM''03), pp. 166-177, San Francisco, CA, 2003.
[10]J. Yang, P. Yu, W. Wang, and J. Han, “Mining Long Sequential Patterns in a Noisy Environment”, Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 406-417, Madison, Wisconsin, 2002.
[11]J. Han and M. Kamber, Data mining: concepts and techniques, Academic Press, 2001.
[12]J. Srivastava, Mining temporal data, http://www.cs.umn.edu/research/websift/ survey/.
[13]Q. Zhao and S. S. Bhowmick, “Sequential Pattern Mining: A Survey”, Technical Report Center for Advanced Information Systems, School of Computer Engineering, Nanyang Technological University, Singapore, 2003.
[14]R. Bellazzi, C. Larizza, P. Magni, and R. Bellazzi, “Quality Assessment of Hemodialysis Services through Temporal Data Mining”, Lecture Notes in Computer Science, Vol. 2780, pp. 11-20, 2003.
[15]J. T. Lee and Y. T. Wang, “Efficient data mining for calling path patterns in GSM networks”, Information Systems, Vol. 28, No. 8, pp. 929-948, 2003.
[16]R. Srikant and Y. Yang, “Mining web logs to improve website organization”, Proceedings of the Tenth International World Wide Web Conference, pp. 430-437, Hong Kong, 2001.
[17]M. S. Chen, J. Han, and P. S. Yu, “Data mining: an overview from a database perspective”, IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 866-883, 1996.
[18]W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, Knowledge discovery in databases: an overview, AAAI/MIT press, 1991.
[19]W. G. Aref, M. G. Elfeky, and A. K. Elmagarmid, “Incremental, online, and merge mining of partial periodic patterns in time-series databases”, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 3, pp. 335-345, 2004.
[20]J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. C. Hsu, “FreeSpan: frequent pattern-projected sequential pattern mining”, Proceedings of 2000 International Conference on Knowledge Discovery and Data Mining, pp. 355-359, Boston, Massachusetts, 2000.
[21]M. Y. Lin, S. Y. Lee, and S. S. Wang, “DELISP: Efficient Discovery of Generalized Sequential Patterns by Delimited Pattern-Growth Technology”, Lecture Notes in Computer Science, Vol. 2336, pp. 198-209, 2002.
[22]M. Y. Lin and S.-Y. Lee, “Incremental update on sequential patterns in large databases by implicit merging and efficient counting”, Information Systems, Vol. 29, No. 5, pp. 385-404, 2004.
[23]F. Masseglia, P. Poncelet, and M. Teisseire, “Incremental mining of sequential patterns in large databases”, Data and Knowledge Engineering, Vol. 46, No. 1, pp. 97-121, 2003.
[24]J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu, “Mining access patterns efficiently from web logs”, Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 396-407, Kyoto, Japan, 2000.
[25]J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M. C. Hsu, “PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth”, Proceedings of 12th International Conference on Data Engineering, pp. 215-224, Heidelberg, Germany, 2001.
[26]M. J. Zaki, “SPADE: an efficient algorithm for mining frequent sequences”, Machine Learning Journal, Vol. 42 No.1-2, pp. 31-60, 2001.
[27]M. S. Chen, J. S. Park, and P. S. Yu, “Efficient data mining for path traversal patterns”, IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No.2, pp.209-221, 1998.
[28]Y. L. Chen, S. S. Chen, and P. Y. Hsu, “Mining hybrid sequential patterns and sequential rules”, Information Systems, Vol. 27, No. 5, pp.345-362, 2002.
[29]Y. L. Chen, M. C. Chiang, and M. T. Kao, “Discovering time-interval sequential patterns in sequence databases”, Expert Systems with Applications, Vol. 25, No. 3, pp. 343-354, 2003.
[30]Y. L. Chen and C. K. Huang, “Discovering fuzzy time-interval sequential patterns in sequence databases”, IEEE Transactions on Systems, Man and Cybernetics, Vol. 35, No. 5, pp. 959-972, 2005.
[31]Y. L. Chen, Y. H. Hu, “Constraint-based sequential pattern mining: The consideration of recency and compactness”, Decision Support Systems, Vol. 42, No. 2, pp. 1203-1215, 2006.
[32]R. S. Chen, G. H. Tzeng, C. C. Chen, and Y. C. Hu, “Discovery of fuzzy sequential patterns for fuzzy partitions in quantitative attributes”, ACS/IEEE International Conference on Computer Systems and Applications, pp. 144-150, 2001.
[33]R. Cooley, B. Mobasher, and J.Srivastava, “Data preparation for mining world wide web browsing patterns”, Journal of Knowledge and Information Systems, Vol. 1, No. 1, pp. 5-32, 1999.
[34]J. Han, G. Dong, and Y. Yin, “Efficient mining of partial periodic patterns in time series database”, Proceedings of 1999 International Conference on Data Engineering, pp. 106-115, Sydney, Australia, 1999.
[35]J. Han, W. Gong, and Y. Yin, “Mining segment-wise periodic patterns in time-related databases”, Proceedings of 1998 International Conference on Knowledge Discovery and Data Mining, pp. 214-218, New York, New York, 1998.
[36]S. Ma and J. L. Hellerstein, “Mining partially periodic event patterns with unknown periods”, Proceedings of the 17th International Conference Data Engineering, pp. 205-214, Heidelberg, Germany, 2001.
[37]H. Mannila, H. Toivonen, and A. Inkeri Verkamo, “Discovery of frequent episodes in event sequences”, Data Mining and Knowledge Discovery, Vol. 1, No. 3, pp. 259-289, 1997.
[38]Helen Pinto, J. Han, J. Pei, K. Wang, Q. Chen, and Umeshwar Dayal, “Multi-dimensional sequential pattern mining”, Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM 2001), pp. 81-88, Atlanta, Georgia, 2001.
[39]S. L. Wang, C. Y. Kuo, and T. P. Hong, “Mining fuzzy similar sequential patterns from quantitative data”, IEEE International Conference on Systems, Man and Cybernetics, Hammamet, Tunisia, 2002.
[40]G. Dong and J. Li, “Efficient mining of emerging patterns: Discovering trends and differences”, Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 43-52, San Diego, California, 1999.
[41]B. Liu, Y. Ma, and R. Lee, “Analyzing the interestingness of association rules from the temporal dimension”, IEEE International Conference on Data Mining (ICDM-2001), pp. 377-384, Silicon Valley, CA, 2001.
[42]H. S. Song, J. K. Kim, and S. H. Kim, “Mining the change of customer behavior in an internet shopping mall”, Expert Systems with Applications, Vol. 21, No. 3, pp. 157-168, 2001.
[43]I. H. Toroslu, “Repetition support and mining cyclic patterns”, Expert Systems with Applications, Vol. 25, No. 3, pp. 303-311, 2003.
[44]Y. H. Hu and Y. L. Chen, “Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism”, Decision Support Systems, Vol. 42, No. 1, pp. 1-24, 2006.
[45]J. Luo and Bridges S. M., “Mining fuzzy association rules and fuzzy frequency episodes for intrusion detection”, International Journal of Intelligent Systems, Vol. 15, No. 8, pp. 687-703, 2000.
[46]M. N. Garofalakis, R. Rastogi, K. Shim, “SPIRIT: Sequential Pattern Mining with Regular Expression Constraints”, Proceedings of 25th VLDB Conference, pp. 223-234, San Francisco, California, 1999.
[47]C. M. Kuok, A. Fu, M. H. Wong, “Mining fuzzy association rules in databases,” SIGMOD Record, Vol. 27, No. 1, pp.41-46, 1998.
[48]W. Zhang, “Mining fuzzy quantitative association rules”, Proceedings 11th International Conference Tools Artificial Intelligence, pp. 99-102, Chicago, IL, 1999.
[49]J. Pei, G. Dong, W. Zou, and J. Han, “Mining Condensed Frequent-Pattern Bases”, Knowledge and Information Systems, Vol. 6, No. 5, pp. 570-594, 2004.
[50]M. V. Joshi, G. Karypis, and V. Kumar, “A Universal Formulation of Sequential Patterns”, Technical Report # 99-021, University of Minnesota, 1999.
[51]R. Agrawal and R. Srikant, “Fast algorithms for mining association rules, Proceedings of 1994 International Conference Very Large Data Bases”, pp. 487-499, 1994.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top