跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.10) 您好!臺灣時間:2025/09/30 14:19
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陸坤義
論文名稱:應用分層隨機抽樣和動差保留法採掘重要關聯規則之方法
論文名稱(外文):Mining Interesting Association Rules by Stratified Sampling and Moment-Preserving Thresholding
指導教授:許玟斌許玟斌引用關係
學位類別:碩士
校院名稱:東海大學
系所名稱:資訊工程與科學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:中文
論文頁數:56
中文關鍵詞:分層隨機抽樣動差保留法高頻項目集關聯規則Apriori演算法
外文關鍵詞:stratified samplingmoment-preserving thresholding approachfrequent itemsetsassociation rulesApriori algorithm
相關次數:
  • 被引用被引用:4
  • 點閱點閱:328
  • 評分評分:
  • 下載下載:48
  • 收藏至我的研究室書目清單書目收藏:2
資料探勘(Data Mining)是現今非常熱門的研究領域,其中,有關如何快速產生關聯規則(Association Rules)的議題,更是被廣泛的討論與研究。產生關聯規則可分為二個階段;第一階段,自交易資料庫中找出高於使用者設定門檻值的高頻項目集(Frequent Itemsets)。第二階段,利用高頻項目集產生信賴度高的關聯規則。由於資料庫的資料量相當龐大,若經由頻繁的存取來產生高頻項目集相當的花費時間,因此,須以有效率且有效果之演算法來進行以節省成本。鑑於以上所述,本研究以分層隨機抽樣(Stratified Sampling)和動差保留法(Moment Preserving Thresholding)為演算基礎,期望能減少自資料庫中採掘高頻項目集和關聯規則所需的時間。在計算項目集支持度時,許多演算法[5][11][12][16]均未考慮購買數量,造成支持度之誤差,進而影嚮高頻項目集的參考價值。有鑑於此,本研究嘗試在採掘高頻項目集時,將購買數量做為計算支持度之加權值,以避免產生誤差並提昇支援決策的效果。本研究之演算法可分為五個步驟:步驟一、利用模擬程式產生交易資料庫。步驟二、利用動差保留法將每筆交易依利潤分類。步驟三、利用分層隨機抽樣法自前一步驟產生的分類結果中抽取足夠的樣本。步驟四、使用Apriori演算法掃描樣本以採掘高頻項目集。步驟五、利用高頻項目集產生關聯規則。經模擬証明,本研究提出之演算法除能夠有效且快速的自資料庫採掘高頻項目集之外,產生的關聯規則對決策者更具有參考價值。
Data mining is a very important issue; the association rule mining is the mostly studied one due to the wide applications among the proposed mining methods. The association mining problem should be proceeding by efficient algorithm; it could be divided into two phases. Phase 1: mining frequent itemsets from database. Phase 2: using frequent itemsets to generate the association rules. The proposed algorithm is based on stratified sampling and moment-preserving thresholding approach. Because of the reducing size of dataset, the proposed algorithm is efficient for the association rule mining problem. Moreover, we considered the buying quantities of items in support counting phase to increase the persuasion of frequent itemsets and association rules. The proposed algorithm has five steps. Step 1: generating transaction database by our simulator. Step 2: using moment-preserving thresholding approach to classify transaction by profit. Step 3: using stratified sampling to draw sample database. Step 4: mining frequent itemsets in sample database by Apriori algorithm. Step 5: generating association rules by frequent itemsets. By way of simulation results, the proposed algorithm is efficient in mining frequent itemsets. Besides, the association rules generated by proposed quantitative support counting method are more valuable.
摘要 I
ABSTRACT II
目錄 III
圖目錄 V
表目錄 VI
第 1 章 緒論 1
1.1 資料探勘 1
1.2 資料探勘相關技術 3
1.2.1 購物籃分析 3
1.2.2 分類技術 6
1.2.3 群集技術 7
1.3 研究動機與目的 8
1.4 論文架構 10
第 2 章 文獻探討 12
2.1 APRIORI演算法 12
2.2 DHP演算法 14
2.3 PINCER-SEARCH演算法 18
2.4 SAMPLING演算法 20
第 3 章 理論架構 23
3.1 研究步驟 23
3.2 理論方法 27
3.2.1 抽樣理論 27
3.2.2 動差保留法 29
3.2.3 數量化支持度計算法 32
3.2.4 數量化信賴度計算法 33
第 4 章 模擬實驗 35
4.1 實驗環境 35
4.2 傳統支持度計算法之效能評估 36
4.3 應用QSC計算法之效能評估 40
第 5 章 結論與未來研究 44
5.1 結論 44
5.2 未來研究 45
參考文獻 47
[1] A. Tabatabai, “Edge Location and Data Compression for Digital Imagery, ” Ph.D. dissertation, School of Elect. Engrg., Purdue University, Dec. 1981.
[2] C. C. Aggarwal and P. S. Yu, “A New Approach to Online Generation of Association Rules, ” IEEE Trans. On Knowledge and Data Engineering, Vol. 13, No. 4, pp. 527-540, 2001.
[3] C. C. Aggarwal, C. Procopiuc, and P. S. Yu, “Find Localized Associations in Market Basket Data, ” IEEE Trans. On Knowledge and Data Engineering, Vol. 14, No. 1, pp. 51-62, 2002.
[4] D. I. Lin and Z. M. Kedem, “Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set, “ IEEE Trans. on Knowledge and Data Engineering, Vol. 14, No. 3, pp. 553-556, 2002.
[5] E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang, “Finding Interesting Associations without Support Pruning, “ IEEE Trans. On Knowledge and Data Engineering, Vol. 13, No. 1, pp. 64-78, 2001.
[6] F. Berzal and J. C. Cubero, “TBAR: An efficient method for association rule mining in relational databases,” Data and Knowledge Engineering, Vol. 37, No. 1, pp. 47-64, 2001.
[7] G. Szego, “Orthogonal Polynomials, ” Vol. 23, 4th ed., Amer. Math. Soc., Providence R. I., 1975.
[8] H. Toivonen, “Sampling Large Databases for Association Rules, “ Proc. Int’l Conf. Very Large Data Bases, pp. 134-145, 1996.
[9] J. Han, Y. Fu, “Mining Multiple-Level Association Rules in Large Databases, “ IEEE Trans. on Knowledge and Data Engineering, Vol. 11, No. 5, pp. 798-805, 1999.
[10] J. R. Quilan, “C4.5: Programs for Machine Learning, ” Morgan Kaufmann, 1993.
[11] J. R. Quilan, “Induction of decision trees, ” Machine Learning, pp. 81-106, 1986.
[12] J. S. Park, M. S. Chen, and P. S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules, “ Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 175-186, 1995.
[13] L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification of Regression Trees, ” Wadsworth, 1984.
[14] R. Agrawal, R. Srikant, “Fast Algorithms for Mining Association Rules, ”Proc. Int’l Conf. Very Large Data Bases, pp. 487-499, 1994.
[15] R.T. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining,” Proc. Int’l Conf. Very Large Data Bases, pp. 144-155, 1994.
[16] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An Efficient Data Clustering Method for Large Databases, ” Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 103-114, 1996
[17] V. Ganti, J. Gehrke, and R. Ramakrishnan, “Mining Very Large Databases, “ IEEE Computer Society, Vol. 32, No. 8, pp. 38-45, 1999.
[18] V. Ganti, R. Ramakrishnan, and J. Gehrke, “Clustering Large Datasets in Arbitrary Metric Spaces,” Proc. Int’l Conf. Data Engineering, pp. 502-511, 1999.
[19] W. H. Tsai, “Moment-preserving thresholding: A New Approach, ” Computer Vision, Graphics, and Image Processing, Vol. 29, 377-393, 1985.
[20] W. Mendenhall, L. Ott, and R. L. Scheaffer, “Elementary Survey Sampling, ” Wadsworth, 1986.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊