跳到主要內容

臺灣博碩士論文加值系統

(44.201.94.236) 您好!臺灣時間:2023/03/25 00:50
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:彭信啟
論文名稱:支持度不設限高頻樣式探勘之改良方法
論文名稱(外文):An Improved Algorrithm for Frequent Patterns Mining without Support Constraint
指導教授:曾秋蓉曾秋蓉引用關係
學位類別:碩士
校院名稱:中華大學
系所名稱:資訊工程學系(所)
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:71
中文關鍵詞:資料探勘資料採礦關連法則
外文關鍵詞:Knowledge discoveryData miningAssociation rule
相關次數:
  • 被引用被引用:1
  • 點閱點閱:147
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
在關連法則探勘中,Apriori和高頻樣式成長法(FP Growth )是常見的兩種演
算法,但是他們在探勘大型項目集時,必須先給定固定的最小支持度。然而,在
使用者沒有進階資訊前,並無法得知最佳的最小支持度為何。William Cheung 等
學者於2003年所提出的CATS樹方法中,允許使用者在執行過程中調整最小支持
度,並且能有效地改善傳統關連法則探勘。然而,在此演算法中,因為採用動態
建構樹狀結構,造成其過程複雜、探勘方式繁瑣。本文中,我們提出了排序壓縮
樹(Sorted Compressed Tree, 簡稱SC樹) 的資料結構來改進CATS樹。透過前處
理的方式,可靜態地建構所需的樹狀結構,縮短建構時間,並且在進行關連法則
探勘時只需由下到上探勘,取代高頻樣式成長法(FP Growth)中遞迴的方式,能夠
有效改善關連法則探勘法執行速度與效能。實驗結果顯示我們所提出的SC樹其
建構與探勘時間平均提升1.6倍與2.4倍,空間上平均較CATS樹建構與探勘節省
16%與15%,時間和空間都比CATS樹有效率。
Apriori and FP Growth are well-known algorithms for association rule mining.
Both of them require a given value of minimum support for mining the large itemsets. However, without enough knowledge about the application domain, an appropriated minimum support is hard to be chosen. To cope with this problem, William Cheung et al. proposed an algorithm called CATS. It allowed users to adjust the minimum support while pattern mining, and it also improved the efficiency of the association rule mining. Nevertheless, CATS builds its Tree structure ynamically, so that the mining process is complex and tedious. In this paper we present an improved algorithm of CATS Tree called Sorted Compressed Tree (SC Tree). After pre-sorting the data, the Tree structure can be built statically. Moreover, the association rules are mined in a bottom up style instead of recursion in the FP Growth algorithm. Hence the cost of association rule mining is reduced. From preliminary experimental results, execution time of SC Tree’sbuilding and mining improved 1.6 times and 2.4 times averagely. SC Tree saves about 16% and 15% of memory space in building and mining averagely respect to the CATS Tree, consequently the efficiency of association rule mining is improved.
第1 章概論..................................................................................................................1
1.1 研究背景.........................................................................................................1
1.2 研究動機.........................................................................................................3
1.3 研究目的.........................................................................................................4
第2 章相關研究..........................................................................................................6
2.1 資料探勘簡介.................................................................................................6
2.2 關連法則探勘...............................................................................................10
2.3 Apriori 演算法...............................................................................................12
2.4 高頻樣式成長(FP Growht)演算法..............................................................16
2.4.1 FP 樹建構法...................................................................................17
2.4.2 高頻樣式成長探勘法......................................................................19
2.5 壓縮排序交易序列樹演算法.......................................................................21
2.5.1 CATS 樹建構法...............................................................................21
2.5.2 CATS 樹探勘法..............................................................................24
2.6 群聚壓縮樹演算法.......................................................................................27
2.6.1 GC 樹的建構..................................................................................27
2.6.2 GC 樹探勘法..................................................................................29
第3 章排序壓縮樹....................................................................................................32
3.1 排序壓縮樹演算法.......................................................................................32
3.1.1 資料前處理......................................................................................32
3.1.2 排序壓縮樹建構法..........................................................................33
3.1.3 排序壓縮樹探勘法..........................................................................34
3.2 實例探討.......................................................................................................35
3.2.1 排序壓縮樹關連法則探勘實例......................................................35
3.2.2 排序壓縮樹的運算效能探討..........................................................37
3.2.3 排序壓縮樹的空間需求探討..........................................................40
第4 章實驗結果與評估............................................................................................43
4.1 實驗設計.......................................................................................................43
4.2 運算效能評估...............................................................................................44
4.2.1 建立探勘模型之運算效能評估......................................................44
4.2.2 探勘關連法則之運算效能評估......................................................46
4.3 空間需求評估...............................................................................................52
4.3.1 建立探勘模型之空間需求評估......................................................52
4.3.2 探勘關連法則之空間需求評估......................................................53
第5 章結論與未來研究方向....................................................................................60
ii
5.1 結論...............................................................................................................60
5.2 未來方向.......................................................................................................60
參考文獻......................................................................................................................61
[1] 林文修, “Data Mining 探索( 上)”, 決策論壇, 第13 期, 1998 。
http://lab.geog.ntu.edu.tw/course/ginformation/relative%20papers/Data%20Minin
g01.htm
[2] 陳星壁,“以關連法則探勘為基礎之電路版件維修輔助系統”,中華大學資訊
工程研究所碩士論文,2006。
[3] 葉志豪,“探勘家族特徵樹中之關聯規則”,國立中央大學資訊管理研究所碩
士論文,2004。
[4] 劉信義,”關聯法則挖礦法之研究-採用群聚壓縮樹演算法”,電子商務與
數位學習研討會,2006。
[5] 羅仙旺,“應用排序索引修剪技術之高效率關連式法則演算法之研究”,中華
大學資訊工程研究所碩士論文,2003。
[6] Agrawal R., and Srikant R., “Fast algorithms for mining association rules”,
Proceedings of the 20th International Conference on Very Large Data Bases
VLDB' 1994.
[7] Cheung W., and Zaiane O. R., “FrequentPattern Mining Without Candidate
Generation or Support Constraint”,Master’s Thesis,University of Alberta, 2002.
[8] Cheung W., and Zaiane O. R., “Incremental Mining of Frequent Patterns without
Candidate Generation or Support Constraint,” Publication of the Seventh
International Database Engineering and Applications Symposium, 2003.
[9] Chiou Chuang-kai, and Tseng Judy C.R., “A Scalable Association Rules Mining
Algorithm Based on Sorting, Indexing and Triming”, Proceedings of the Sixth
International Conference on Machine Learning and Cybernetics, Hong Kong,
August 19-22, 2007.
[10] El-Hajj M., and Zaiane O. R., “COFI Approach for Mining Frequent Itemsets
Revisited”, Proceedings of the 9th ACM SIGMOD Workshop on Research
Issues in Data Mining and Knowledge Discovery, DMKD June 13, 2004.
[11] El-Hajj M., and Zaiane O. R., “Non Recursive Generation of Frequent
K-itemsets from Frequent Pattern Tree Representations”,Data Warehousing and
Knowledge Discovery, 5th International Conference, DaWaK, 2003.
[12] Gao Jie., and Li Shaojun, “A Method of Improvement and Optimization on
Association Rules Apriori Algorithm”,Proceedings of the 6th World Congress
on Intelligent Control and Automation, June 21 - 23, 2006.
[13] Gopalan P., and Sucahyo Y. G., “Improving the Efficiency of Frequent Pattern
Mining by Compact Data Structure Design”, Proceedings of the 4th
International Conference on Intelligent Data Engineering and Automated
Learning (IDEAL2003), Hong Kong, 2003.
[14] Gopalan R., and Sucahyo Y. G., “High Performance Frequent Patterns Extraction
62
using Compressed FP-Tree”, Proceedings of the SIAM International Workshop
on High Performance and Distributed Mining, Orlando, USA, April 2004.
[15] Gopalan Raj P., and Sucahyo Yudho Giri, “ Fast Frequent Itemset Mining using
Compressed Data Representation”, AppliedInformatics 2003: 1203-1208,The
21st IASTED International Multi-Conference on Applied Informatics.
[16] Grahne G., and Zhu J., “Fast Algorithms for Frequent Itemset Mining Using
FP-Trees”,IEEE Transactions on Knowledge and Data Engineering, pages
1347-1362, 2005.
[17] Han J., and Pei J., and Yin Y., “Mining Frequent Patterns without Candidate
Generation”, Proc.2000 ACM-SIGMOD Int. Conf. on Management of Data.
Dallas, 2000.
[18] Hang Jian-Min., Chen Fu-Zan, and Zhang Qin, “An Efficient Algorithm for
Finding All Frequent Itemsets”,Machine Learning and Cybernetics, 2006
International Conference on Publication Date Aug. 2006
[19] http://www.almaden.ibm.com/cs/disciplines/iis/
[20] Li Y. C., and Chang C. C., “A New FP-Tree Algorithm for Mining Frequent
Itemsets”, AdvancedWorkshop on Content Computing (AWCC 2004).
[21] Liu Junqiang, Pan Yunhe, and Wang Ke, and Han Jiawei, ”Mining Frequent
Item Sets by Opportunistic Projection”,Proceedings of the Eighth ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining,
July 23-26, 2002.
[22] Pei. J., Han J., Mortazavi-Asl B., Pinto H. , Chen Q., Dayal U., and Hsu M. C.,
“PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern
Growth” , Proceedings of 17th International Conference on Data Engineering,
pp. 215-224, April 2-6, 2001.
[23] Sucahyo Y. G., and Gopalan R. P., “CT-ITL: Efficient Frequent Item Set Mining
Using a Compressed Prefix Tree with Pattern Growth”, Proceedings of 14th
Australasian Database Conference, delaide, Australia, 2003.
[24] Sucahyo Y. G., and Gopalan R. P., “CT-PRO: A Bottom-Up Non Recursive
Frequent Itemset Mining Algorithm Using Compressed FP-Tree Data Structure”,
Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining
Implementations (FIMI), Brighton, UK, 2004.
[25] Wang J., Han J., Lu Y., and Tzvetkov P., “TFP: An Efficient Algorithm for
Mining Top-K Frequent Closed Itemsets”, IEEE Transactions on Knowledge
and Data Engineering, Vol. 17, No. 5, pp.652-664, May 2005.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊