跳到主要內容

臺灣博碩士論文加值系統

(216.73.217.165) 您好!臺灣時間:2026/05/20 07:14
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張啟原
研究生(外文):Chang, Chi-Yuan
論文名稱:期間限制探勘於高效用序列樣式
論文名稱(外文):Mining High Utility Sequential Patterns with Duration Constraints
指導教授:胡雅涵胡雅涵引用關係
指導教授(外文):Hu, Ya-Han
口試委員:許巍嚴翁政雄
口試委員(外文):Hsu, Wei-YenWeng, Cheng-Hsiung
口試日期:2014-10-03
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊管理學系暨研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2014
畢業學年度:103
語文別:英文
論文頁數:38
中文關鍵詞:效用序列樣式探勘Prefixspan演算法時間限制探勘
外文關鍵詞:utility sequential pattern miningPrefixspantime constraint-based mining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:533
  • 評分評分:
  • 下載下載:11
  • 收藏至我的研究室書目清單書目收藏:0
高效用序列樣式探勘是資料探勘領域中一種很重要的應用。其中,它被廣泛應用在在購買行為分析的領域。透過高效用序列樣式探勘,企業可以窺探顧客的購買習慣,藉此得知產品之間的關聯性與大多數顧客購買哪些高單價商品,進而制訂銷售方針。
然而,在過去的研究裡,高效用序列樣式探勘出的樣式,會有顧客購買時間很長的樣式,在購買的關係上意義不大,為了要找出較具意義的樣式,我們針對要找尋的樣式加入時間條件的篩選。
在本篇中,加入期間條件的限制(duration constraint)與間隔的限制(gap constraint),在期間限制方面提出了最大跨度區間(maximum span length),期望找出來的樣式是在特定的一段時間內發生,在間隔的限制方面提出了最大間隔(maxgap)與最小間隔(mingap)。本研究提出HUD演算法,整合時間限制並調整Prefixspan演算法做效用序列樣式探勘。在實驗中測試執行時間、樣式數量、單一樣式平均價值、查準率、查全率、F測量等指標來比較我們的方法與傳統方法的差異。

Utility Sequential pattern mining (utility SPM) is one of most important data mining technique, and it is widely used in customer behavior scenario. Organizations are able to explore customers’purchase habit and comprehend the relationship between merchandise and high-priced merchandise which most customers buy through utility SPM process to develop sales policy.
However, in previous studies, the pattern in conventional utility SPM, if the average length of sequences in database is long, the algorithm often generate too many long sequential patterns. It is not meaningful for relationship purchased. In order to find the more meaningful the style, we add the time constraints for the pattern mining.
In this paper, we include the duration constraints in utility SPM. Specifically, we propose maximum span length constraint that expect to find out patterns which are occurring within a specific period of time. Next, the maxgap and mingap constraints is used to confine the reasonable time-interval between adjacent events. A new framework High Utility sequential pattern mining with Duration constraints (HUD) algorithm to mine high utility sequential patterns by the integration constraints. In experiment, we test runtime, number of patterns, value per pattern, precision, recall, F-measure to compare performance between our method and traditional utility SPM.

1.Introduction..............................1
1.1 Background..............................1
1.2 Motivation..............................2
1.3 Organization............................4
2.Related Work..............................5
2.1 Utility Pattern Mining (UPM)............5
2.2 Utility SPM.............................7
2.3 Constraint-based SPM....................8
3.Problem Definition.......................11
3.1 Problem Definition.....................11
4.The HUD-PrefixSpan Algorithm.............17
4.1 Find 1-length-HUD-SPs..................21
4.2 Divide and search......................21
4.3 Find subsets of sequentia lpatterns....22
5. Experiment Design.......................25
5.1 Real-life Datasets.....................25
5.2 Experiment Evaluation..................25
6. Conclusion and future works.............34
Reference..................................35

Agrawal, Rakesh, & Srikant, Ramakrishnan. (1995). Mining sequential patterns. Paper presented at the Data Engineering, 1995. Proceedings of the Eleventh International Conference on.
Ahmed, Chowdhury Farhan, Tanbeer, Syed Khairuzzaman, & Jeong, Byeong-Soo. (2010). A novel approach for mining high-utility sequential patterns in sequence databases. ETRI journal, 32(5), 676-686.
Ahmed, Chowdhury Farhan, Tanbeer, Syed Khairuzzaman, Jeong, Byeong-Soo, & Lee, Young-Koo. (2009). Efficient tree structures for high utility pattern mining in incremental databases. Knowledge and Data Engineering, IEEE Transactions on, 21(12), 1708-1721.
Ahmed, Chowdhury Farhan, Tanbeer, Syed Khairuzzaman, Jeong, Byeong-Soo, & Lee, Young-Koo. (2011). HUC-Prune: an efficient candidate pruning technique to mine high utility patterns. Applied Intelligence, 34(2), 181-198.
Boulvain, Frédéric, Mabille, Cédric, Poulain, Geoffrey, & Da Silva, Anne-Christine. (2009). Towards a palaeogeographical and sequential framework for the Givetian of Belgium. Geologica Belgica, 12.
Chan, Raymond, Yang, Qiang, & Shen, Yi-Dong. (2003). Mining high utility itemsets. Paper presented at the Data Mining, 2003. ICDM 2003. Third IEEE International Conference on.
Chen, Enhong, Cao, Huanhuan, Li, Qing, & Qian, Tieyun. (2008). Efficient strategies for tough aggregate constraint-based sequential pattern mining. Information Sciences, 178(6), 1498-1518.
CHHAJED, MEHZABIN SHAIKHand GYANKAMAL J. (2012). Review on Financial Forecasting using Neural Network and Data Mining Technique.
Chiang, Ding-An, Wang, Cheng-Tzu, Chen, Shao-Ping, & Chen, Chun-Chi. (2009). The cyclic model analysis on sequential patterns. Knowledge and Data Engineering, IEEE Transactions on, 21(11), 1617-1628.
Chu, Chun-Jung, Tseng, Vincent S, & Liang, Tyne. (2009). An efficient algorithm for mining high utility itemsets with negative item values in large databases. Applied Mathematics and Computation, 215(2), 767-778.
Cios, Krzysztof J, Pedrycz, Witold, & Swiniarsk, RM. (1998). Data mining methods for knowledge discovery. Neural Networks, IEEE Transactions on, 9(6), 1533-1534.
Das, Resul, & Turkoglu, Ibrahim. (2009). Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method. Expert Systems with Applications, 36(3), 6635-6644.
Denoeux, Thierry. (1995). A k-nearest neighbor classification rule based on Dempster-Shafer theory. Systems, Man and Cybernetics, IEEE Transactions on, 25(5), 804-813.
Erwin, Alva, Gopalan, Raj P, & Achuthan, NR. (2007a). A bottom-up projection based algorithm for mining high utility itemsets. Paper presented at the Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining-Volume 84.
Erwin, Alva, Gopalan, Raj P, & Achuthan, NR. (2007b). CTU-Mine: An efficient high utility itemset mining algorithm using the pattern growth approach. Paper presented at the Computer and Information Technology, 2007. CIT 2007. 7th IEEE International Conference on.
Garofalakis, Minos N, Rastogi, Rajeev, & Shim, Kyuseok. (1999). SPIRIT: Sequential pattern mining with regular expression constraints. Paper presented at the VLDB.
Hartigan, John A, & Wong, Manchek A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100-108.
Hu, Ya-Han, Huang, Tony Cheng-Kui, & Kao, Yu-Hua. (2012). Knowledge discovery of weighted RFM sequential patterns from customer sequence databases. Journal of Systems and Software.
Hu, Ya-Han, Wu, Fan, & Yen, Tzu-Wei. (2010). Considering RFM-values of frequent patterns in transactional databases. Paper presented at the Software Engineering and Data Mining (SEDM), 2010 2nd International Conference on.
Ji, Xiaonan, Bailey, James, & Dong, Guozhu. (2007). Mining minimal distinguishing subsequence patterns with gap constraints. Knowledge and Information Systems, 11(3), 259-286.
Lan, Guo-Cheng, Hong, Tzung-Pei, Tseng, Vincent S, & Wang, Shyue-Liang. (2012). An improved approach for sequential utility pattern mining. Paper presented at the Granular Computing (GrC), 2012 IEEE International Conference on.
Li, Hua-Fu, Huang, Hsin-Yun, & Lee, Suh-Yin. (2011). Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowledge and information systems, 28(3), 495-522.
Li, Yu-Chiang, Yeh, Jieh-Shan, & Chang, Chin-Chen. (2008). Isolated items discarding strategy for discovering high utility itemsets. Data & Knowledge Engineering, 64(1), 198-217.
Lin, Ming-Yen, Hsueh, Sue-Chen, & Chang, Chia-Wen. (2008). Mining closed sequential patterns with time constraints. Journal of information Science and Engineering, 24(1), 33.
Liu, Bing, Hsu, Wynne, Chen, Shu, & Ma, Yiming. (2000). Analyzing the subjective interestingness of association rules. Intelligent Systems and Their Applications, IEEE, 15(5), 47-55.
Liu, Junqiang, Wang, Ke, & Fung, Benjamin. (2012). Direct Discovery of High Utility Itemsets without Candidate Generation. Paper presented at the Data Mining (ICDM), 2012 IEEE 12th International Conference on.
Liu, Ying, Liao, Wei-keng, & Choudhary, Alok. (2005). A two-phase algorithm for fast discovery of high utility itemsets Advances in Knowledge Discovery and Data Mining (pp. 689-695): Springer.
Mannila, Heikki, Toivonen, Hannu, & Verkamo, A Inkeri. (1997). Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3), 259-289.
Masseglia, Florent, Teisseire, Maguelonne, & Poncelet, Pascal. (2009). Sequential Pattern Mining.
Ngai, Eric WT, Xiu, Li, & Chau, Dorothy CK. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert Systems with Applications, 36(2), 2592-2602.
Pei, Jian, Han, Jiawei, Mortazavi-Asl, Behzad, Wang, Jianyong, Pinto, Helen, Chen, Qiming, . . . Hsu, Mei-Chun. (2004). Mining sequential patterns by pattern-growth: The prefixspan approach. Knowledge and Data Engineering, IEEE Transactions on, 16(11), 1424-1440.
Pei, Jian, Han, Jiawei, & Wang, Wei. (2002). Mining sequential patterns with constraints in large databases. Paper presented at the Proceedings of the eleventh international conference on Information and knowledge management.
Pei, Jian, Han, Jiawei, & Wang, Wei. (2007). Constraint-based sequential pattern mining: the pattern-growth methods. Journal of Intelligent Information Systems, 28(2), 133-160.
Powers, David Martin. (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation.
Radhakrishna, Vangipuram, Srinivas, Chintakindi, & Guru Rao, CV. (2013). Constraint based Sequential Pattern Mining in Time Series Databases-A Two Way Approach. AASRI Procedia, 4, 313-318.
Sandhu, Parvinder S, Dhaliwal, Dalvinder S, Panda, SN, & Bisht, Atul. (2010). An Improvement in Apriori Algorithm Using Profit and Quantity. Paper presented at the Computer and Network Technology (ICCNT), 2010 Second International Conference on.
Seno, Masakazu, & Karypis, George. (2002). Slpminer: An algorithm for finding frequent sequential patterns using length-decreasing support constraint. Paper presented at the Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on.
Seno, Masakazu, & Karypis, George. (2005). Finding frequent patterns using length-decreasing support constraints. Data Mining and Knowledge Discovery, 10(3), 197-228.
Sharma, Heerash Kumar, & Partheria, Chandra Bhan. (2013). Improved High Utility Mining Algorithm.
Tseng, Vincent S, Wu, Cheng-Wei, Shie, Bai-En, & Yu, Philip S. (2010). UP-Growth: an efficient algorithm for high utility itemset mining. Paper presented at the Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining.
Wang, Jing, Liu, Ying, Zhou, Lin, Shi, Yong, & Zhu, Xingquan. (2007). Pushing frequency constraint to utility mining model Computational Science–ICCS 2007 (pp. 685-692): Springer.
Wu, Cheng-Wei, Lin, Yu-Feng, Philip, S Yu, & Tseng, Vincent S. (2012). Mining High Utility Episodes in Complex Event Sequences.
Wu, Cheng Wei, Shie, Bai-En, Tseng, Vincent S, & Yu, Philip S. (2012). Mining top-K high utility itemsets. Paper presented at the Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining.
Yao, Hong, & Hamilton, Howard J. (2006). Mining itemset utilities from transaction databases. Data & Knowledge Engineering, 59(3), 603-626.
Yao, Hong, Hamilton, Howard J, & Butz, Cory J. (2004). A foundational approach to mining itemset utilities from databases. Paper presented at the The 4th SIAM international conference on data mining.
Yi, Shengwei, Zhang, Yuanyuan, Zhao, Tianheng, Ma, Shilong, Yin, Jie, Sun, Hejie, & Chen, Xin. (2012). Efficient Sequential Generator Discovery Over Stream Sliding Windows. Advanced Science Letters, 11(1), 437-442.
Yin, Junfu, Zheng, Zhigang, & Cao, Longbing. (2012). USpan: An efficient algorithm for mining high utility sequential patterns. Paper presented at the Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining.
Yun, Unil. (2008). An efficient mining of weighted frequent patterns with length decreasing support constraints. Knowledge-Based Systems, 21(8), 741-752.
Zhang, Xiaolong, Gong, Wenjuan, & Kawamura, Yoshihiro. (2004). Customer behavior pattern discovering with web mining Advanced Web Technologies and Applications (pp. 844-853): Springer.
Zheng, Kai, Padman, Rema, & Johnson, Michael P. (2007). User interface optimization for an electronic medical record system. Studies in health technology and informatics, 129(2), 1058.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top