跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.175) 您好!臺灣時間:2024/12/09 21:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林順意
研究生(外文):Shun-Yi Lin
論文名稱:ㄧ個以交易序號及時間值來探勘序列資料庫中之時間間隔序列樣式的方法
論文名稱(外文):The Transation-ID-Time Algorithm for Mining Time-Interval Sequential Patterns in Sequence Databases
指導教授:張玉盈
指導教授(外文):Ye-In Chang
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:94
中文關鍵詞:深度優先搜尋資料探勘序列樣式終端邊時間間隔序列樣式
外文關鍵詞:Data miningTime-interval sequential patternsSequential patternsTerminal edgeDepth-first search
相關次數:
  • 被引用被引用:0
  • 點閱點閱:163
  • 評分評分:
  • 下載下載:11
  • 收藏至我的研究室書目清單書目收藏:0
我們可以從大型資料庫中利用資料探勘技術取出一些資料庫內含的豐富資訊,包括一些先前未知的訊息,可能潛在有用的訊息。從大型資料庫中所發現的訊息和知識是非常有用的,它可管範的利用在各種應用,包括市場分析,決策支持,缺點發掘和企業管理等。在這之前許多學者提出了眾多方法來挖掘資料庫中的訊息,而序列樣式的挖掘是其中重要的方法之一。舉一個典型序列樣式的例子,當有一個顧客買了一台電腦後,過不久又返回購買一個掃描器和麥克風。儘管這種類型的序列樣式記錄了每個項目的購買順序,但卻無法提供我們每個項目間的時間間隔資訊。因此後來,一個延伸的新序列樣式被提出,叫時間間隔序列樣式,時間間隔序列樣式不但記錄了每個項目的順序,而且也提供了每個連續項目間的時間間隔資訊。挖掘時間間隔序列樣式有一個非常著名的OGST 演算法,OGST 演算法主要是利用所有長度為2 的時間間隔序列樣式來建構一個有方向性的graph。接著,OGST 演算法利用深度優先搜尋的方法去搜尋此graph,並找出所有時間間隔序列樣式。儘管如此,在一個序列中OGST 演算法不但浪費空間去儲存每個項目的位置,而且還搜尋了許多不需要的路徑。因此OGST 演算法的效能可被改善。而且,我們發現在OGST 演算法中會出現一些不正確的結果和一些遺失的序列樣式。為了避免OGST 演算法這些問題,我們提出了更有效率的TIDT演算法找出時間間隔序列樣式。我們提出針對有方向性的graph 修正版,將有方向性的graph 再多加一些有用的資訊(Sid, Stime, Etime)。有這些新加入資訊後,我們可避免搜尋到不需要的路徑。基於修正有方向性的graph 後,我們利用提出的TIDT 演算法便能更有效率的找出時間間隔序列樣式。我們也利用產生終端邊並且記錄曾經搜尋過的路徑資訊。動態地透過終端邊來修改原有graph,可以避免在搜尋graph 時重複搜尋某些路徑。根據模擬的結果,我們顯示出我們提出的TIDT 演算法比OGST 演算法更有效率。
Data mining extracts implicit, previously unknown and potentially useful informationfrom databases. The discovered information and knowledge are useful forvarious applications, including market analysis, decision support, flaw detection andbusiness management. Many approaches have been proposed to extract information,and the mining of sequential patterns is one of the most important methods. A typicalexample of a sequential pattern is like that a customer who has bought a computer,returns to buy a scanner and a microphone. Although this kind of sequential patternincludes the order of the items, the time between the items is unknown. Therefore, anextension of sequential patterns, called time-interval sequential patterns is proposed,which not only reveals the order of items but also the time intervals between successiveitems. The OGST algorithm finds out the time-interval sequential patternsspecially. The OGST algorithm uses the 2-time-interval sequence to construct an orientedgraph. Then, it uses the depth-first search method to traverse the graph, andfind out the all large-time-interval sequences. However, the OGST algorithm not onlywastes space to store the position of item i in a sequence, but also traverses manyunnecessary paths. The performance could be improved. Moreover, we find severalincorrect results and missing sequences based on the OGST algorithm. To avoid theseproblems, in this thesis, we propose the TIDT algorithm to find out the time-intervalsequential patterns efficiently. We propose a revised version of the oriented graph byrecording some more information (Sid, Stime, Etime). Based on the revised versionof the oriented graph, we use our proposed TIDT algorithm to efficiently find thetime-interval sequential patterns. We also use a terminal edge to reduce the searchcost which avoids the case of traversing the same vertex next time. We need not totraverse the same path again. From our performance study based on the syntheticdata, we show that our proposed TIDT algorithm is more efficient than the OGSTalgorithm.
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Mining Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Repeating Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Non-Trivial Repeating Patterns . . . . . . . . . . . . . . . . . 2
1.2.2 Polyphonic Repeating Patterns . . . . . . . . . . . . . . . . . 3
1.2.3 Maximum-Length Repeating Patterns . . . . . . . . . . . . . . 4
1.3 Mining Sequential Patterns . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Sequential Patterns without Time-Interval . . . . . . . . . . . 5
1.3.2 Sequential Patterns with Time-Interval . . . . . . . . . . . . . 6
1.3.3 Practical Applications . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 10
2. A Survey of Algorithms for Mining Sequential Patterns . . . . . . 12
2.1 The Apriori Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 The M2P Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 The I-Apriori Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 The OGST Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.2 The OGST Algorithm . . . . . . . . . . . . . . . . . . . . . . 23
3. The Transation-ID-Time Algorithm . . . . . . . . . . . . . . . . . . 27
3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 The Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Step 1: Scan the Database to Produce Large 1-Sequences (L1) 30
3.2.2 Step 2: Produce Large 2-Time Interval Sequences (LT2) from
IT Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Step 3: Construct a SIRG(Sid Item Relation Graph) . . . . . 39
3.2.4 Step 4: Search the SIRG to Produce the Large Sequences and
Sequential Patterns . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1 Generation of Synthetic Data . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.1 Time Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.2 Sensitivity to Parameters . . . . . . . . . . . . . . . . . . . . . 70
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Very Large Data Bases, pp. 487–499, 1994.
[2] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. of the 11th Int. Conf. on Data Engineering, pp. 3–14, 1995.
[3] F. Ao, Y. Yan, J. Huang, and K. Huang, “Mining Maximal Frequent Itemsets in Data Streams Based on FP Tree,” Proc. of the 5th Int. Conf. on Machine Learning and Data Mining in Pattern Recognition, pp. 479–489, 2007.
[4] C. I. Chang, H. E. Chueh, and N. P. Lin, “Sequential Patterns Mining with Fuzzy Time-Intervals,” Proc. of the 6th Int. Conf. on Fuzzy Systems and Knowledge Discovery, pp. 165–169, 2009.
[5] C. I. Chang, H. E. Chueh, and Y. C. Luo, “An Integrated Sequential Patterns Mining with Fuzzy Time-Intervals,” Proc. of Int. Conf. on Systems and Informatics, pp. 2294–2298, 2012.
[6] Y. I. Chang, C. E. Li, and T. H. Chen, “A Position-Join Method for Mining Maximum-Length Repeating Patterns in Music Databases,” Proc. of National Computer Symposium, pp. 1–12, 2011.
[7] J. Chen, “An Updown Directed Acyclic Graph Approach for Sequential Pattern Mining,” IEEE Trans. on Knowledge and Data Engineering, Vol. 22, No. 7, pp. 913–928, 2010.
[8] Y. L. Chen, M. C. Chiang, and M. T. Ko, “Discovering Time-Interval Sequential Patterns in Sequence Databases,” Expert Systems with Applications, Vol. 25, No. 3, pp. 343–354, 2003.
[9] Y. L. Chen and T. C.-K. Huang, “Discovering Fuzzy Time-Interval Sequential Patterns in Sequence Databases,” IEEE Trans. on Systems, Man, and Cybernetics, Vol. 35, No. 5, pp. 959–972, 2005.
[10] S. Chiu, M. Shan, J. Huang, and H. Li, “Mining Polyphonic Repeating Patterns from Music Data Using Bit-String Based Approaches,” Proc. of IEEE Int. Conf. on Multimedia and Expo, pp. 1170–1173, 2009.
[11] J. Han, J. Peid, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” ACM SIGMOD Record, Vol. 29, No. 2, pp. 1–12, 2000.
[12] J. L. Hsu, C. C. Liu, and A. L. P. Chen, “Efficient Repeating Pattern Finding in Music Databases,” Proc. of the 7th Int. Conf. on Information and Knowledge Management, pp. 281–288, 1998.
[13] J. L. Hsu, C. C. Liu, and A. L. P. Chen, “Discovering Non-trivial Repeating Patterns in Music Data,” IEEE Trans. on Multimedia, Vol. 3, No. 3, pp. 311–325, 2001.
[14] Y. H. Hu, T. C.-K. Huang, H. R. Yang, and Y. L. Chen, “On Mining Multi-Time-Interval Sequential Patterns,” Data and Knowledge Engineering, Vol. 68, No. 10, pp. 1112–1127, 2009.
[15] Y. H. Hu, F. Wu, and C. I. Yang, “Mining Multi-Level Time-Interval Sequential Patterns in Sequence Databases,” Proc. of the 2th Int. Conf. on Software Engineering and Data Mining, pp. 416–421, 2010.
[16] C. R. Ji and Z. H. Deng, “Mining Frequent Ordered Patterns without Candidate Generation,” Proc. of the 4th Int. Conf. on Fuzzy Systems and Knowledge Discovery, pp. 402–406, 2007.
[17] S. Joshi and R. Jain, “A Dynamic Approach for Frequent Pattern Mining Using Transposition of Database,” Proc. of the 2th Int. Conf. on Communication Software and Networks, pp. 498–501, 2010.
[18] I. Karydis, A. Nanopoulos, and Y. Manolopoulos, “Finding Maximum-Length Repeating Patterns in Music Databases,” Multimedia Tools and Applications, Vol. 32, No. 1, pp. 49–71, 2006.
[19] H. Li and N. Zhang, “Mining Maximal Frequent Itemsets Over a Stream Sliding Window,” Proc. of Information Computing and Telecommunications, pp. 110–113, 2010.
[20] K. C. Lin, I. E. Liao, and Z. S. Chen, “An Improved Frequent Pattern Growth Method for Mining Association Rules,” Expert Systems with Applications, Vol. 38, No. 5, pp. 5154–5161, 2011.
[21] J. Liu, S. Yan, and J. Ren, “The Design of Storage Structure for Sequence in Incremental Sequential Patterns Mining,” Proc. of the 6th Int. Conf. on Networked Computing and Advanced Information Management, pp. 330–334, 2010.
[22] J. Liu, S. Yan, and J. Ren, “The Design of Frequent Sequence Tree in Incremental Mining of Sequential Patterns,” Proc. of IEEE Int. Conf. on Software Engineering and Service Science, pp. 679–682, 2011.
[23] E. H. C. Lu, V. S. Tseng, and P. S. Yu, “Mining Cluster-Based Temporal Mobile Sequential Patterns in Location-Based Service Environments,” IEEE Trans. On Knowledge and Data Engineering, Vol. 23, No. 6, pp. 914–927, 2011.
[24] G. Mao, X. Wu, X. Zhu, G. Chen, and C. Liu, “Mining Maximal Frequent Itemsets from Data Streams,” Information Sciences, Vol. 33, No. 3, pp. 251–262, 2007.
[25] D. Perera, J. Kay, I. Koprinska, K. Yacef, and O. R. Zaiane, “Clustering and Sequential Pattern Mining of Online Collaborative Learning Data,” IEEE Trans. On Knowledge and Data Engineering, Vol. 21, No. 6, pp. 759–772, 2009.
[26] J. M. Ren and J. R. Jang, “Discovering Time-Constrained Sequential Patterns for Music Genre Classification,” IEEE Trans. on Audio, Speech, and Language Processing, Vol. 20, No. 4, pp. 1134–1144, 2012.
[27] S. Tanbeer, C. Ahmed, B. Jeong, and Y. Lee, “Efficient Single-Pass Frequent Pattern Mining Using a Prefix-Tree,” Information Sciences, Vol. 179, No. 5, pp. 559–583, 2009.
[28] L.Wang and J. Liu, “Using Oriented Graph to Discover Time-Interval Sequential Patterns,” Proc. of Int. Conf. on Computer Science and Software Engineering, pp. 634–637, 2008.
[29] S. J. Yen, Y. S. Lee, C. K.Wang, and J. W.Wu, “An Efficient Approach for Mining Frequent Patterns Based on Traversing a Frequent Pattern Tree,” Proc. of Int. Conf. on Computer Science and Software Engineering, pp. 354–357, 2008.
[30] W. Zhang, H. Liao, and N. Zhao, “Research on the FP Growth Algorithm about Association Rule Mining,” Proc. of the 10th Int. Conf. on Business and Information Management, pp. 315–318, 2008.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊