跳到主要內容

臺灣博碩士論文加值系統

(44.200.194.255) 您好!臺灣時間:2024/07/16 11:29
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張芳玉
研究生(外文):ANNIE CHANG
論文名稱:探勘正相關及封閉項目集之關聯規則演算法
論文名稱(外文):Mining Association Rules Algorithm with Positive Correlation and Closed Itemsets
指導教授:黃仁鵬黃仁鵬引用關係
學位類別:碩士
校院名稱:南台科技大學
系所名稱:資訊管理系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:中文
論文頁數:77
中文關鍵詞:正相關項目集封閉項目集資料探勘關聯規則。
外文關鍵詞:positive correlation itemsetsclosed itemsetsdata miningassociation rule
相關次數:
  • 被引用被引用:0
  • 點閱點閱:396
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
探勘交易資料庫中項目間的關聯性有助於組織的決策支援。其中探勘技術的演算法關係著資料探勘效能及資源有效的利用。在之前相關的研究中,其所產生的關聯規則最常為人所詬病的問題之一,就是過濾的機制不佳,產生過多的規則,致使決策者淹沒在繁多的規則中。所以提出有效的方法精簡關聯規則是本研究的主要方向。
本研究所提出的PCP(Positive Correlation and Closed Itemsets by Phase )演算法是以GRA(Gradation Reduction Approaches)演算法為主軸並加入封閉項目集(closed itemset)概念。雖然GRA演算法的階段縮減交易長度機制,已大量減少非高頻型樣的數量。但仍存在著高頻項目集不夠精簡及關聯規則太多等問題,導致使用者不能精準的解讀和利用所產生的關聯規則。
本研究利用刪除負相關項目集來縮減每一階段的高頻項目集,所探採出來的高頻項目集數量小但更具代表性,並在每一階段長度產生的高頻項目集當中,找出包含前一階段長度的封閉項目集,並刪除被包含的前一階段長度的高頻項目集,最後以高效率探勘出精簡的關聯規則。在現實生活中的資料庫容量通常都是大於記憶體容量,為了解決此問題,本研究也以PCP演算法為主題提出另一修正版PCP-M(Positive Correlation and Closed Itemsets by Phase - Modified Version)演算法,PCP-M演算法採用資料庫分割方式執行探勘任務,每個子資料庫僅需進行四次實體I/O動作,不隨著高頻項目集的長度增長而增加實體I/O的次數,以避免耗費過多的I/O時間,也可有效提高執行效率與實用性。
The correlation among the itemsets of data mining in the transactions intensified the effect of decision supporting system in the organization. The algorithm of data mining technology plays the key roles for the effectiveness of data mining and the utilities of the resourses. One of the most defective problem of the previous researches for the correlative data mining algorithm has low effectiveness for system filtering. Too many regulations which have been generated by the decision supporting systems cause the poor efficiency. To present a simple and effective correlation algorithm is the main objective of this research.
A Positive Correlation and Closed Itemsets by Phase(PCP) algorithm was presented in this reaseach to modify Gradation Reduction Approaches(GRA) algorithm by adding concept of closed itemsets and positive correlation itemsets. Although GRA algorithm has gradually reduced a great deal of works in the transactions of database, and can reduce a great number of infrequent itemsets, but the decision makers are confused by too many frequent itemsets and association rules which are generated by GRA algorithm when they want to make decisions.
The PCP algorithm use the concept of closed itemsets and positive correlation itemsets to reduce the number of association rules and the mining results will be meaningful. The size of the databases in the real world is always greater than the size of the memory. In order to solve this problem, we propose a modifying algorithm - PCP-M(Positive Correlation and Closed Itemsets by Phase - Modified Version); it divides a large database into many sub-databases and mines association rules from those sub-databases. The PCP-M algorithm only scans database four times and will not be affect by the length of frequent itemsets. The PCP-M algorithm avoids wasting a lot of I/O time and increases the efficiency and the practicability in application.
摘  要 iv
ABSTRACT v
誌  謝 vi
目  次 vii
表目錄 ix
圖目錄 x
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 1
1.3 研究目的 2
1.4 論文架構 3
第二章 文獻探討 4
2.1 資料探勘 4
2.2 關聯規則 6
2.3 相關式規則 7
2.4 相關演算法 8
2.4.1 Apriori演算法 8
2.4.2 FP-growth演算法 9
2.4.3 NNIL演算法 10
2.4.4 A-close演算法 13
2.4.5 CLOSET演算法 13
2.4.6 GDA演算法 15
2.4.7 GRA演算法 15
第三章 研究方法 17
3.1 前言 17
3.2 PCP演算法 18
3.2.1 PCP演算法流程圖 19
3.2.2 PCP演算法虛擬碼 20
3.2.3 資料壓縮 21
3.2.4階段縮減交易長度及合併之處理 21
3.2.5以前一階段高頻項目集過濾非必要項目集之程序說明 21
3.2.6過濾負相關高頻項目集之程序說明 23
3.2.7階段過濾負相關項目集之處理程序說明 24
3.2.8 過濾高頻項目集之處理程序說明 25
3.2.9過濾前一階段之高頻項目集之處理 26
3.2.10項目集表 32
3.2.11 PCP演算法完整實例說明 32
3.3大型資料庫之處理 38
第四章 實驗 46
4.1 實驗環境 46
4.2 實驗參數 46
4.3 測試的演算法 47
4.4 實驗設計 48
4.5 測試結果與分析 49
4.6 討論 59
第五章 結論與未來研究 60
5.1 結論 60
5.2 應用性 60
5.3 未來研究方向 62
參考文獻 63
1.沈清正、陳仕昇、高鴻斌、張元哲、陳家仁、黃琮盛、陳彥良,“資料間隱含關係的挖掘與展望”,中央大學資訊管理系。
2.黃仁鵬、熊浩志、郭煌政 (2004),“直覺拆解之關聯法則演算法-IDA”,《第十屆資訊管理暨實務研討會》,國立台中技術學院。
3.黃仁鵬、熊浩志 (2005),“快速資料探勘演算法與相關應用”,南台科技大學 資訊管理系碩士論文。
4.黃仁鵬、藍國誠 ,“高效率探勘關聯規則之演算法-GRA”,電子商務學報。(Accepted)
5.賈坤芳、劉家銘 (2001) ,“利用負相關線上挖掘關聯式規則”,國立中興大學資訊科學研究所碩士論文。
6.微軟公司,Microsoft Analysis services的範本倉儲資料庫FoodMart。
7.R. Agrawal, T.Imielinski, and A.Swami, "Mining Association Rules Between Sets of Items in Large Databases," In proc. of the ACM SIGMOD Conference on Management of Data, pp.207-216, 1993.
8.R. Agrawal, R. Srikant, "Fast algorithms for mining association rules," Proceedings of 1994 International Conference on Very Large Data Bases, pp.487-499, 1994.
9.S. Brin, R. Motwani, and C. Silverstein, "Beyond market basket:generalizing association rules to correlations," Proc. of the ACMSIGMOD, pages 265-276, 1997.
10.J. Han, J. Pei, and Y. Yin, "Mining Frequent Patterns without Candidate Generation," Proc. ACM SIGMOD Int. Conf. on Management of Data, pp.1-12, 2000.
11.http://fimi.cs.helsinki.fi/data/.
12.http://java.sun.com/j2se/1.4.2/docs/api/java/util/HashMap.html.
13. “IBM Almaden-Quest data mining synthetic data generation code”, http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData.
14.W.-Y. Kim, Y.-K. Lee, and J. Han, "CCMine: Efficient Mining of Confidence-Closed Correlated Patterns," Proc. 2004 Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'04) ,2004.
15.H.O. Lancaster, "The Chi-squared Distribution," John Wiley & Sons, NewYork, 1969.
16.D. Lin, and Z. M. Kedem, "Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set," Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology, pp105-119, 1998.
17.R. Meo, "Theory of dependence values," ACM Trans. Database System,Vol. 25, No. 3, pages 380–406, 2000.
18.J. S. Park, M. S. Chen, and P. S. Yu, "An Effective hash-based Algorithm for Mining Association Rules," Proceddings of the ACM SIGMOD Conference on Management of Data - SIGMOD'95, pp.175-186, May 1995.
19.N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal., "Discovering frequent closed itemsets for association rules" , In Proc. 7th Int. Conf. Database Theory (ICDT’99),pages 398–416, 1999.
20.J. Pei, J. Han, and R. Mao. , "CLOSET: An efficient algorithm for mining frequent closed itemsets," In DMKD'00, 2000.
21.M. Seno and G. Karypis, "LPMiner: An Algorithm for Finding Frequent Itemsets Using Length-Decreasing Support Constraint," Proceedings of the 2001 IEEE International Conference on Data Mining(ICDM), pp.505-512, 2001.
22.J. Wang, J. Han, and J. Pei, "CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets," Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining(KDD'03),2003.
23.M. Zaki and C. Hsiao. ,"CHARM: An efficient algorithm for closed itemset mining, " In SDM'02., 2002.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top