跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.152) 您好!臺灣時間:2025/11/04 13:46
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:劉育瑒
研究生(外文):Yu-Yang Liu
論文名稱:運用MapReduce架構之平行遺傳模糊資料探勘
論文名稱(外文):Parallel Genetic-Fuzzy Mining with MapReduce Architecture
指導教授:洪宗貝洪宗貝引用關係
指導教授(外文):Tzung-Pei Hong
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:英文
論文頁數:87
中文關鍵詞:遺傳演算法MapReduce資料預處理模糊資料挖掘FP-growth
外文關鍵詞:MapReducegenetic algorithmFP-growthfuzzy miningdata preprocessing
相關次數:
  • 被引用被引用:0
  • 點閱點閱:145
  • 評分評分:
  • 下載下載:2
  • 收藏至我的研究室書目清單書目收藏:0
模糊資料探勘技術能有效地透過將數量資訊轉換為模糊函式的方法,找出資料庫中隱藏的語意關聯規則,但良好的模糊函式是決定模糊資料探勘最終關聯規則品質的重要關鍵,因此過去有許多研究提出使用遺傳演算法訓練並提升模糊函式之質量來有效提升關聯規則之質量。但這類方法仍有執行時間過長的問題,且在模糊函式訓練完成後,對於頻繁項目集的挖掘同樣是一件相當費時的程序。因此在本篇論文中,我們提出一系列以MapReduce為基礎的演算法來加快遺傳模糊探勘的整體速度。本篇論文的貢獻可分為三部分,包括原始資料的預處理、使用遺傳演算法訓練模糊函式以及模糊關聯規則的推導,所有程序都使用MapReduce作分散式處理;資料的預處理除了能將其轉換為MapReduce架構所需之key-value格式外,更進一步將各自物品的數值資訊統整起來,有效的減少多餘的資料庫掃描次數;針對遺傳模糊函式訓練的部分,最耗時的fitness計算將被設計為分散式計算;最後,本研究設計了一個採用分散式FP-growth的方法來提升尋找模糊關聯規則的執行效率。單機與MapReduce版本的效能將會在實驗中比較及討論,其結果顯示本論文所提出的分散式方法能有效的縮短整體模糊探勘的執行時間。
Fuzzy data mining can successfully find out hidden linguistic association rules by transforming quantity information into fuzzy membership values. In the derivation process, good membership functions play a key role in achieving the quality of finial results. In the past, some researches were proposed to train membership functions by genetic algorithms and could indeed improve the quality of found rules. Those kinds of methods were, however, suffered from the long execution time in the training phase. Besides, after appropriate fuzzy membership functions are found, mining out the frequent itemsets from them is also a very time-consuming process as traditional data mining. In this thesis, we thus propose a series of approaches based on the MapReduce architecture to speed up the GA-fuzzy mining process. The contributions can be divided into three parts, including data preprocessing, membership-function training by GA, and fuzzy association-rule derivation. All are performed by MapReduce. For data preprocessing, the proposed approach can not only transform the original data into key-value format to fit the requirement of MapReduce, but also efficiently reduce the redundant database scan by joining the quantities into lists. For membership-function training by GA, the fitness evaluation, which is the most time-costly process, is distributed to shorten the execution time. At last, a distributed fuzzy rule mining approach based on FP-growth is designed to improve the time efficiency of finding fuzzy association rules. The performance between using a single processor and using MapReduce will be compared and discussed from experiments and the results show that our approaches can efficiently reduce the execution time of the whole process.
論文審定書 i
致謝 ii
摘要 iii
Abstract iv
Contents v
List of Figures vii
List of Table viii
CHAPTER 1 Introduction 1
1.1 Motivation 1
1.2 Contributions 3
1.3 Organization 4
CHAPTER 2 Related Works 6
2.1 Genetic Fuzzy Mining 6
2.2 MapReduce 8
2.3 Parallel FP-Growth 9
CHAPTER 3 Efficient Data Preprocessing for Fuzzy Mining with MapReduce (EDPFM-MR) 12
3.1 Problem Statement and Definitions 13
3.2 Proposed Algorithm, EDPFM-MR 14
3.3 An Example of Using EDPFM-MR 16
CHAPTER 4 Parallel Genetic Fuzzy Membership Function Training with MapReduce (PGFMFT-MR) 23
4.1 Problem Statement and Definitions 25
4. 1. 1 Chromosome Representation 25
4. 1. 2 Initial Population 26
4. 1. 3 Fitness Function 26
4. 1. 4 Genetic Operators 30
4.2 Proposed Algorithm, PGFMFT-MR 31
4.3 An Example of Using PGFMFT-MR 34
CHAPTER 5 Parallel FP-Growth for Fuzzy Mining with MapReduce (PFPGFM-MR) 46
5.1 Problem Statement and Definitions 47
5.2 The Proposed PFPGFM-MR 48
5.3 An Example of Using PFPGFM-MR 51
CHAPTER 6 Experimental Evaluation 62
6.1 Experimental Datasets 62
6.2 Experimental Results of EDPFM-MR 62
6.3 Experimental Results of PGFMFT-MR 65
6.4 Experimental Result of PFPGFM-MR 67
CHAPTER 7 Conclusion 71
References 73
[1]Alejandro Peña-Ayala, “Educational data mining: a survey and a data mining-based analysis of recent works,” Expert Systems with Applications, vol. 41, no. 4, pp. 1432-1462, 2014.
[2]Chun-Wei Tsai, Chin-Feng Lai, Ming-Chao Chiang and Laurence T. Yang, “Data mining for internet of things: a survey,” IEEE Communications Surveys and Tutorials, vol. 16, no. 1, pp. 77-79, 2014.
[3]Sanjeev Pippal, Lakshay Batra, Akhila Krishna, Hina Gupta andKunal Arora, “Data mining in social networking sites: a social media mining approach to generate effective business strategies,” International Journal of Innovations and Advancement in Computer Science, vol. 3, no. 2, pp. 22-27, 2014.
[4]Ramakrishnan Srikant and Rakesh Agrawal, “Mining quantitative association rules in large relational tables,” ACM Special Interest Group on Management of Data, vol. 25, no. 2, pp. 1-12, 1996.
[5]Huizhen Liu, Shangping Dai and Hong Jiang, “Quantitative association rules mining algorithm based on matrix,” IEEE International Conference on Computational Intelligence and Software Engineering, pp. 1-4, 2009.
[6]Tzung-Pei Hong and Chai-Ying Lee, “Induction of fuzzy rules and membership functions from training examples,” Fuzzy Sets and Systems, vol. 84, no. 2, pp. 33-47, 1996.
[7]Tzung-Pei Hong, Chan-Sheng Kuo and Sheng-Chai Chi, “Mining association rules from quantitative data,” Intelligent Data Analysis, vol. 3, no. 5, pp. 363-376, 1999.
[8]Tzung-Pei Hong, Kuei-Ying Lin and Been-Chian Chien, “Mining fuzzy multiple-level association rules from quantitative data,” Applied Intelligence, vol. 18, no.1, pp. 79-90, 2003.
[9]Hai Jin, Jianhua Sun, Hao Chen and Zongfen Han, “A fuzzy data mining based intrusion detection model,” IEEE International Workshop on Future Trends of Distributed Computing Systems, pp. 191-197, 2004.
[10]Tzung-Pei Hong and Tsung-Ching Lin, “Mining complete fuzzy frequent itemsets by tree structures,” IEEE International Conference on Systems Man and Cybernetics, pp. 563-567, 2010.
[11]Mehmet Kaya and Reda Alhajj, “A clustering algorithm with genetically optimized membership functions for fuzzy association rules mining,” IEEE International Conference on Fuzzy Systems, vol. 2, pp. 881-886, 2003.
[12]Tzung-Pei Hong, Chun-Hao Chen, Yu-Lung Wu and Yeong-Chyi Lee, “A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions,” Soft Computing, vol. 10, no. 11, pp. 1091-1101, 2006.
[13]Chun-Hao Chen, Tzung-Pei Hong and Vincent Shin-Mu Tseng, “A modified approach to speed up genetic-fuzzy data mining with divide-and-conquer strategy,” IEEE Evolutionary Computation, pp. 1-6, 2007.
[14]Tzung-Pei Hong and Yeong-Chyi Lee and Min-Thai Wu, “An effective parallel approach for genetic-fuzzy data mining,” Expert Systems with Applications, vol. 41, pp. 655-662, 2004.
[15]Chun-Wei Lin, Tzung-Pei Hong, Wen-Hsiang Lu, “Linguistic data mining with fuzzy FP-trees,” Expert Systems with Applications, vol. 37, no. 6, pp. 4560-4567, 2010
[16]Tzung-Pei Hong, Chun-Hao Chen, Yeong-Chyi Lee and Yu-Lung Wu, “Genetic-fuzzy data mining with divide-and-conquer strategy,” IEEE Transactions on Evolutionary Computation, vol. 12, no.2, pp. 252-265, 2008.
[17]Jeffrey Dean andSanjay Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
[18]Jian Yu, Salvatore Greco, Pawan Lingras, Guoyin Wang and Andrzej Skowron, “The high-activity parallel implementation of data preprocessing based on MapReduce,” Rough Set and Knowledge Technology, vol. 6401, pp. 646-654, 2010.
[19]Kyong-Ha Lee, Yoon-Joon Lee, Hyunsik Choi, Yon Dohn Chung and Bongki Moon, “Parallel data processing with MapReduce: a survey,” ACM Special Interest Group on Management of Data, vol. 40, no. 4, pp. 11-20, 2011.
[20]Sherif Sakr, Anna Liu and Ayman G. Fayoumi, “The family of MapReduce and large-scale data processing systems,” Communications of the ACM, vol. 46, 2013.
[21]Avrilia Floratou, Jignesh M. Patel, Eugene J. Shekita and Sandeep Tata, “Column-oriented storage techniques for MapReduce,” Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 419-429, 2011.
[22]Feng Li, Beng Chin Ooi, M. Tamer Özsu and Sai Wu, “Distributed data management using MapReduce,” ACM Computing Surveys, vol. 46, no. 3, 2014.
[23]Jorge-Arnulfo Quiané-Ruiz, Christoph Pinkel, Jörg Schad and Jens Dittrich, “RAFT at work: speeding-up MapReduce applications under task and node failures,” ACM Special Interest Group on Management of Data, pp. 1225-1228, 2011.
[24]Li Liu, Eric Li, Yimin Zhang and Zhizhong Tang, “Optimization of frequent itemset mining on multiple-core processor,” International Conference on Vary Large Data Bases, pp. 1275-1285, 2007.
[25]Osmar R. Zaïane, Mahammad El-Hajj and Paul Lu, “Fast parallel association rule mining without candidacy generation,” IEEE International Conference on Data Mining, pp. 665-668, 2001.
[26]Kawuu Wei-Chen Lin and Yu-Chin Luo, “Efficient strategies for many-task frequent pattern mining in cloud computing environments, ” Journal of Knowledge-Based System, vol. 49, pp. 10-21, 2013.
[27]Haoyuan Li, Yi Wang, Dong Zhang, Ming Zhang and Edward Y. Chang, “PFP: parallel FP-growth for query recommendation,” ACM Conference on Recommender Systems, pp. 107-114, 2008.
[28]Le Zhou, Zhiyong Zhong, Jin Chang, Junjie Li, Joshua Zhexue Huang and Shengzhong Feng, “Balanced parallel FP-growth with MapReduce, IEEE Youth Conference on Information Computing and Telecommunications, pp. 243-246, 2010.
[29]Alexandre Parodi and Pierre Bonelli, “A new approach of fuzzy classifier systems,” Proceedings of Fifth International Conference on Genetic Algorithms, pp. 223-230, 1993.
[30]Tzung-Pei Hong, Yu-Yang Liu, Min-Thai Wu and Chun-Wei Tsai “Efficient data preprocessing for genetic-fuzzy mining with MapReduce,” IEEE International Conference on Consumer Electronics - Taiwan, 2015.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top