跳到主要內容

臺灣博碩士論文加值系統

(44.210.132.31) 您好!臺灣時間:2022/08/19 19:37
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳姵吟
研究生(外文):Pey-Yen Chen
論文名稱:以解群方式加速關聯法則探勘之平行處理架構
論文名稱(外文):Performance Speed Up of Parallel Association Rule Mining by De-Clustering Methods
指導教授:曾守正曾守正引用關係
指導教授(外文):Frank S. C. Tseng
學位類別:碩士
校院名稱:國立高雄第一科技大學
系所名稱:資訊管理所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:86
中文關鍵詞:關聯法則解群最小擴張樹最短擴張路徑平行運算各個擊破法
外文關鍵詞:Shortest Spanning PathParallel ComputingMinimum Spanning TreeDivide-and-ConquerDe-ClusteringAssociation Rule
相關次數:
  • 被引用被引用:0
  • 點閱點閱:151
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0

應用資料探勘(Data Mining)技術的目的無非是希望在龐大的資料中,找到有用的資訊以供決策者作為參考的依據。資訊時代的社會中,由於資料取得的途徑增多,資料量會隨時間不斷的增長及資料型態日趨複雜化,在資料的探勘過程中,需花費相當多的時間來找出結果,而有效能不彰的問題產生。因此,本研究的主要目的就是探討如何提昇關聯法則Apriori中產生高頻項目組(Large Itemsets)的效能,並提出以平行處理的架構改善其效率。
在本研究中,我們觀察到了在多維度的資料中,其稀疏的特性,提出先行將資料轉置成二元格式之稀疏矩陣(Sparse Matrix),用以快速產生一項目之高頻項目組。隨後,帶入解群(De-Clustering)的概念,將之應用到平行處理的領域中;分別採用最小擴張樹(Minimum Spanning Tree, MST)與最短擴張路徑(Shortest Spanning Path, SSP)的方式,用以記錄最鄰近的資料,並以解群的方式將資料均分至各工作站中,使其滿足子群與子群之間的相似度高,而子群也足以代表母群的特性。最後,將關聯法則Apriori演算法中最冗長耗時的步驟-產生高頻項目組的任務,以Row-prime order的順序均分至各工作站,使其任務間完全獨立,不會導致互相牽制而產生效能低落的現象。本研究利用上述的架構,使其達到資料獨立性與任務獨立性,經過實驗結果可驗證實現效能與效率大幅提昇之目標。


Data mining technologies are prolific in contemporary applications to explore versatile and previously untapped knowledge for supporting decision makers. Owing to the ways of obtaining data are versatile in this new information era, the counts of data and dimensions will be supposed to be continuously increasing and getting more complicated. Besides, owing to the huge target data volume, most of the algorithms suffer from the elaboration on finding all candidates that fit the subjective conditions, which is a process prone to time-consuming. Therefore, the main objective of our work is to speedup the large itemsets generation process by employing a parallel architecture for the Apriori method.
We observed that most of the itemsets in practical transaction data are prone to being sparse. In Our approach, transaction data will be transformed into a binary sparse matrix, which will be a base for generating the first stage large itemsets. Then, we broach the concept of de-clustering to implement the generations process in parallel, which adopts the idea of Minimum-Spanning-Tree (will be abbreviated as MST) and Shortest-Spanning-Path (will be abbreviated as SSP). We conduct both MST and SSP approaches to find out the nearest transaction data and to link them together. Then, the finally obtained MST and SSP are used to evenly de-cluster all transaction data into subgroups. All subgroups are not only quite similar to each other, but also quite similar to the original group. Finally, the most time-consuming task ─generating the candidate itemsets─ will be decomposed into subtasks and dispatched to sites in Row-prime order for processing all subgroups in parallel. We have compared MST and SSP approaches and conclude that although their obtained precision is quite similar, however, the later is superior than the former from the performance viewpoint. In our framework, thanks to the data and tasks are distributed independently, communications in-between occur only at the very beginning and ending of the whole process. That makes our work a very efficient and effective approach.

中文摘要i
英文摘要ii
誌 謝iv
目 錄vi
表目錄ix
圖目錄x
壹、緒論1
一、研究背景與動機1
二、研究目的2
三、重要性3
四、研究架構與流程4
五、論文結構5
貳、相關研究6
一、資料探勘與關聯法則6
二、關聯法則之推導技術9
(一)序列演算法9
(二)平行演算法16
三、平行處理(PARALLEL PROCESSING)20
(一)以硬體平台區分22
(二)以平行型態區分23
(三)以負載平衡方式區分24
四、資料解群(DE-CLUSTERING)方式25
(一)最小擴張樹(Minimal Spanning Tree, MST)27
(二)最短擴張路徑(Shortest Spanning Path, SSP)28
五、資料配置方式(DATA ALLOCATION)30
六、各個擊破法(DIVIDE-AND-CONQUER)31
參、研究架構與方法33
一、問題描述33
二、系統架構34
三、系統流程說明36
(一)資料轉換36
(二)資料切割平行化38
(三)任務切割平行化45
四、範例說明47
(一)資料轉換階段48
(二)資料切割平行化階段49
(三)任務切割平行化階段54
肆、實驗結果與討論56
一、實驗設備及資料說明56
二、實驗介紹58
三、數據分析及效能評估58
四、大量資料項目的效能探討64
伍、結論與未來研究66
一、結論66
二、未來研究67
參考文獻69


[1]Agrawal, R., and Shafer, J. C., "Parallel Mining of Association Rules," IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, Dec. 1996, pp. 962-969.[2]Agrawal, R., and Srikant, R., “Fast Algorithms for Mining Association Rules in Large Databases,” Proceedings of the 20th Internal Conference on Very Large Data Bases-VLDB’94, Sep. 1994, pp. 487-499.[3]Agrawal, R., Imielinski, T., and Swami, A., “Mining Association Rule between Sets of Items in Large Databases,” Proceedings of the ACM SIGMOD International Conference on Management of Data-SIGMOD’93, May 1993, pp. 207-216.[4]Bayardo, R., and Agrawal. R., “Mining the Most Interesting Rules,” Proceedings Of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining-SIGKDD’99, San Diego, CA USA, 1999, pp. 145-154.[5]Berry, Michael J. A., and Linoff, G., Data Mining Techniques: For Marketing, Sales, and Customer Support. New York: John Wiley & Sons, 1997.[6]Brin, S., Motwani, R., Ullman, J. D., and Tsur, S., “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” Proceedings of the ACM SIGMOD International Conference on Management of Data-SIGMOD’97, 1997, pp. 255-264.[7]Chen, M. S., Han, J., and Yu, P. S. “Data Mining: An Overview from Database Perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, 1996, pp. 866-883.[8]Cheung, D., and Xiao, Y., “Effect of Data Skewness in Parallel Mining of Association Rule,” In 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, April 1999, pp. 48-60.[9]Cheung, D.W., Han, J., Ng, V. T., Fu, A.W., and Fu, Y. “A Fast Distributed Algorithm for Mining Association Rules”, Proceedings of 4th International Conference on Parallel and Distributed Information Systems, 1996, pp. 31-42.[10]Fang, M. T., Lee, R. C. T., and Chang, C. C., “The Idea of De-Clustering and its Applications”, Proceedings of 12th International Conference on Very Large Data Bases-VLDB’86, Kyoto, Japan, Aug. 1986, pp. 181-188.[11]Goodchild, M. F., and Grandfield, A. W., “Optimizing raster Storage: an examination of four alternatives,” Proceedings of Auto-Carto 6, Vol. 1, October 1983, pp. 400-407.[12]Graham, R. and P. Hell, “On the History of the Minimum Spanning Tree Problem,” Annals of the History of Computing, Vol. 7, No. 1, 1985, pp. 43-57.[13]Han, J., and Fu, Y., “Discovery of Multiple-Level Association Rules from Large Databases,” Proceedings of the 21st International Conference on Very Large Data Bases-VLDB’95, Sep 1995, pp. 420-431.[14]IBM Almaden Research Center, http://www.almaden.ibm.com/cs/quest/syndata.html[15]Klir, G. J., and Folger, T. A., Fuzzy Sets, Uncertainty, and Information, Prentice Hall, Englewood Clinffs, NJ, 1988.[16]Kruskal, J. B., Jr., “On the Shortest Spanning Subtree of a Graph and Traveling Salesman Problem,” Proceedings of the American Mathematical Society, Vol. 7, 1956, pp. 48-50.[17]Lee R.C.T., Chang R.C., Tseng S.S. and Tsai T.T., Introduction to the Design and Analysis of Algorithms, Flag, Taipei, 2001.[18]Liu, H., and Setiono, R., “Feature Selection via Discretization,” IEEE Transaction on Knowledge and Data Engineering, vol. 9, no. 4, July/Aug. 1997, pp. 642-645.[19]McBryan, O. A. “State-of-The-Art in Highly Parallel Computer Systems,” in Noor, A.K. (ed.), Parallel Computations and Their Impact on Mechanics, ASME, New York, 31-47, 1987.[20]Park, J. S., Chen, M. S., and Yu, P. S., “An Effective Hash-based Algorithm for Mining Association Rules,” Proceedings of the ACM SIGMOD Conference on Management of Data -SIGMOD’95, May 1995, pp. 175-186.[21]Park, J. S., Chen, M. S., and Yu, P. S., “Efficient Parallel Data Mining for Association Rules,” Proceeding of the 1995 International Conference on Information and Knowledge Management, 1995, pp. 31-36.[22]Park, J. S., Chen, M. S., and Yu, P. S., “Using a Hash-Based Method with Transaction Trimming for Mining Association Rules”, IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 5, 1995, pp. 813-825.[23]Prim, R.C., “Shortest Connection Networks and Some Generalizations,” Bell System Technical Journal, Vol. 36, 1957, pp. 1389-1401.[24]Quinn, M. J., “Analysis and Implementation of Branch and Bound Algorithms on a Hypercube Multicomputer”, IEEE Transactions on computers, Vol. 39, No.3, 1990, pp.384-387.[25]Robertson E. S., “The Parametric Description of Retrieval Tests,” Journal of Documentation, vol. 25, no. 1, pp. 1-27.[26]Sam, E., Han, H., Karypis, G. and Kumar, V., “Scalable Parallel Data Mining for Association Rules,” IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 3, 2000, pp. 352-377.[27]Samet, H., The Design and Analysis of Spatial Data Structures, Addison-Wesley, 1989, pp.13-15.[28]Savasere A., Oiecinski E., and Navathe S “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proceeding of the 20th International Conference on Very Large Data Bases-VLDB’95, 1995, pp. 432-444.[29]Seitz, C. L., and Matisoo, J. “Engineering Limits On Computer Performance, “ Physics Today, Vol. 37, No. 5, 38-45, 1984.[30]Srikant, R., and Agrawal, R., “Mining Generalized Association Rules,” Proceeding of the 20th International Conference on Very Large Data Bases-VLDB’94, Sep 1994, pp. 407-419.[31]Wru, S. Y. and Leu, Y. “An Effective Boolean Algorithm for Mining Association Rules in Large Database,” Proceedings of the 6th International Conference on Database System for Advanced Applications, 1999, pp. 179-186.[32]Zadeh, L.A., “Fuzzy sets,” Information and Control, Vol. 8, 1965, pp. 338-353.[33]Zaki M. J., “Parallel and Distributed Association Mining: A Survey,” IEEE Concurrency, 1999, pp.14-25[34]王仁傑,“有效找出較長資料項目型樣的關聯法則之研究”,逄甲大學訊工程研究所碩士論文,2000[35]何仁傑,“從大型資料庫中掘感興趣的型樣”,輔仁大學資訊工程研究所碩士論文,2000[36]陳彥良等,“資料間隱含關係的挖掘與展望”,二十一世紀台灣湧現中的資訊管理議題專家研討會論文集,大溪:鴻禧山莊,2001[37]陳美蕙,“利用屬性分群演算法有效地挖掘關聯式規則”,中興大學資訊科學研究所碩士論文,2000[38]黃明華,“大型資料庫中關聯式規則平行發掘演算法”,臺灣科技大學管理研究所資訊管理學程碩士論文,1999

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top