跳到主要內容

臺灣博碩士論文加值系統

(44.222.131.239) 您好!臺灣時間:2024/09/09 20:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳彥文
研究生(外文):Yen-Wen Chen
論文名稱:一個有效率的循序樣本探勘系統
論文名稱(外文):An Efficient Sequential Pattern Mining System
指導教授:張昭憲張昭憲引用關係
指導教授(外文):Jau-Shien Chang
學位類別:碩士
校院名稱:淡江大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:中文
論文頁數:42
中文關鍵詞:循序樣本關聯規則資料探勘資料庫
外文關鍵詞:sequential patternsassociation rulesdata miningdatabase
相關次數:
  • 被引用被引用:1
  • 點閱點閱:174
  • 評分評分:
  • 下載下載:18
  • 收藏至我的研究室書目清單書目收藏:4
為加速循序樣本之探勘,本研究提出三種不同的改進做法,並據以完成一套有效率的循序樣本探勘系統ESPM (an Efficient Sequential Pattern Mining system)。此三項做法之特點分述如下:首先,我們提出利用一階頻繁項目集 (frequent 1-itemsets) 建立項目位址串列(Item-Position-Lists),去除各筆交易中無需考慮的項目,有效降低探勘時的比對次數。其次,我們發現資料庫中各筆交易長度可依照支持度 (support) 加以縮減,以進一步減少比對的次數。此外,本研究也引用[2]中的做法,利用Hash Table來改善耗時的二階頻繁項目集之產生。
為驗證ESPM的效能,我們使用十萬筆至一百萬筆的模擬交易資料來進行實驗。實驗結果顯示,與GFP2[7]相比平均可節省超過34%之執行時間。此外,當測試資料由十萬筆增加至一百萬筆時,ESPM執行時間之成長趨勢幾呈線性,顯示ESPM處理大型資料庫時之良好潛力。
To speed up the process of sequential pattern mining, in this thesis, three effective methods are proposed and are used as the basis to implement an Efficient Sequential Pattern Mining system(ESPM). The features of the three methods are described as follow: first, a modified vertical TID-lists are designed to lower the times of matching during mining. Secondly, we found that the length of each transaction, thus the matching time, can be further reduced by removing the items with low supports. Besides, similar to [2], the hash table is also used to improve the efficiency of generating the frequent 2-itemsets which are time-consuming.
To verify the efficiency of ESPM, variant synthetic datasets, which sizes range form from 100,000 to 1,000,000, are used in the experiment. The result shows that, in comparison with GFP2 [7], ESPM can save more than 34% in average of the CPU time. Besides, when the size of the datasets are increased to one million from one hundred thousand, the CPU time grows linearly, which demonstrates the ESPM has excellent potential to process large database.
第一章 緒論
第二章 循序樣本探勘演算法簡介
2.1 APRIORI-LIKE演算法
2.2 GFP2 演算法
第三章 ESPM演算法
3.1 資料庫縮減(DATABASE REDUCTION, DBR)
3.2 二階候選項目集縮減(2-CANDIDATE REDUCTION)
3.3 建立項目位置列表(ITEM-POSITION-LIST CONSTRUCTION)
3.4 演算法效率分析
第四章 實驗結果
4.1 交易資料庫之產生
4.2 實驗數據與討論
第五章 將ESPM系統應用於樂透彩開獎之預測
第六章 結論
參考文獻
[1] R. Agrawal, R. Srikant. Fast algorithm for mining association rules. In: Proceedings of the 20th International Conference on VLDB. Santiago, 1994. 487~499.
[2] J. S. Park, M. Chen, and P. S. Yu. An effective hash based algorithm for mining association rules. In ACM SIGMOD Intl. Conf. Management of Data, May 1995.
[3] D. W. Cheung, V. T. Ng, A. W. Fu and U. Fu. Efficient Mining of Association Rules in Distributed Databases. IEEE Transactions on Knowledge and Data Engineering Vol.8 No.6 pp 911-922, 1996.
[4] R. Srikant and R. Agrawal. Mining sequential patterns. In 11th Int. Conf. Data Engineering, 1995.
[5] R. Agrawal and R. Srikant. Mining sequential patterns: Generalizations and Performance Improvements. Proc. of the Fifth Int''l Conference on Extending Database Technology, 1995.
[6] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by PrefixProjected Pattern Growth. In. Proc.
[7] Y. L. Chen, S. S. Chen, and P. Y. Hsu. Mining hybrid sequential patterns and sequential rules, Information Systems, Vol. 27, No. 5, pp. 345-362. (SCI)
[8] P. Berkhin, J. D. Becher, D. J. Randall. Interactive path analysis of web site traffic. KDD 2001: 414-419
[9] J. Z. Ouh, P. H. Wu, and M. S. Chen. Experimental Results on a Constrained Based Sequential Pattern Mining for Telecommunication Alarm Data. WISE (2) 2001: 186-
[10] J. Pei, J. Han, B. Mortazavi-Asl, H. Zhu. Mining access pattern efficiently from web logs. In proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2000, pp. 396-407.
[11] 沈清正、陳仕昇、高鴻斌、張元哲、陳家仁、黃琮盛、陳彥良,<資料間隱含關係的挖掘與展望>
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top