臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.102) 您好！臺灣時間：2025/12/04 02:01

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

劉育瑒

研究生(外文):

Yu-Yang Liu

論文名稱:

運用MapReduce架構之平行遺傳模糊資料探勘

論文名稱(外文):

Parallel Genetic-Fuzzy Mining with MapReduce Architecture

指導教授:

洪宗貝

指導教授(外文):

Tzung-Pei Hong

學位類別:

碩士

校院名稱:

國立中山大學

系所名稱:

資訊工程學系研究所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2015

畢業學年度:

103

語文別:

英文

論文頁數:

中文關鍵詞:

遺傳演算法、MapReduce、資料預處理、模糊資料挖掘、FP-growth

外文關鍵詞:

MapReduce、genetic algorithm、FP-growth、fuzzy mining、data preprocessing

相關次數:

被引用:0
點閱:146
評分:
下載:2
書目收藏:0

模糊資料探勘技術能有效地透過將數量資訊轉換為模糊函式的方法，找出資料庫中隱藏的語意關聯規則，但良好的模糊函式是決定模糊資料探勘最終關聯規則品質的重要關鍵，因此過去有許多研究提出使用遺傳演算法訓練並提升模糊函式之質量來有效提升關聯規則之質量。但這類方法仍有執行時間過長的問題，且在模糊函式訓練完成後，對於頻繁項目集的挖掘同樣是一件相當費時的程序。因此在本篇論文中，我們提出一系列以MapReduce為基礎的演算法來加快遺傳模糊探勘的整體速度。本篇論文的貢獻可分為三部分，包括原始資料的預處理、使用遺傳演算法訓練模糊函式以及模糊關聯規則的推導，所有程序都使用MapReduce作分散式處理；資料的預處理除了能將其轉換為MapReduce架構所需之key-value格式外，更進一步將各自物品的數值資訊統整起來，有效的減少多餘的資料庫掃描次數；針對遺傳模糊函式訓練的部分，最耗時的fitness計算將被設計為分散式計算；最後，本研究設計了一個採用分散式FP-growth的方法來提升尋找模糊關聯規則的執行效率。單機與MapReduce版本的效能將會在實驗中比較及討論，其結果顯示本論文所提出的分散式方法能有效的縮短整體模糊探勘的執行時間。

Fuzzy data mining can successfully find out hidden linguistic association rules by transforming quantity information into fuzzy membership values. In the derivation process, good membership functions play a key role in achieving the quality of finial results. In the past, some researches were proposed to train membership functions by genetic algorithms and could indeed improve the quality of found rules. Those kinds of methods were, however, suffered from the long execution time in the training phase. Besides, after appropriate fuzzy membership functions are found, mining out the frequent itemsets from them is also a very time-consuming process as traditional data mining. In this thesis, we thus propose a series of approaches based on the MapReduce architecture to speed up the GA-fuzzy mining process. The contributions can be divided into three parts, including data preprocessing, membership-function training by GA, and fuzzy association-rule derivation. All are performed by MapReduce. For data preprocessing, the proposed approach can not only transform the original data into key-value format to fit the requirement of MapReduce, but also efficiently reduce the redundant database scan by joining the quantities into lists. For membership-function training by GA, the fitness evaluation, which is the most time-costly process, is distributed to shorten the execution time. At last, a distributed fuzzy rule mining approach based on FP-growth is designed to improve the time efficiency of finding fuzzy association rules. The performance between using a single processor and using MapReduce will be compared and discussed from experiments and the results show that our approaches can efficiently reduce the execution time of the whole process.

論文審定書 i
致謝 ii
摘要 iii
Abstract iv
Contents v
List of Figures vii
List of Table viii
CHAPTER 1 Introduction 1
1.1 Motivation 1
1.2 Contributions 3
1.3 Organization 4
CHAPTER 2 Related Works 6
2.1 Genetic Fuzzy Mining 6
2.2 MapReduce 8
2.3 Parallel FP-Growth 9
CHAPTER 3 Efficient Data Preprocessing for Fuzzy Mining with MapReduce (EDPFM-MR) 12
3.1 Problem Statement and Definitions 13
3.2 Proposed Algorithm, EDPFM-MR 14
3.3 An Example of Using EDPFM-MR 16
CHAPTER 4 Parallel Genetic Fuzzy Membership Function Training with MapReduce (PGFMFT-MR) 23
4.1 Problem Statement and Definitions 25
4. 1. 1 Chromosome Representation 25
4. 1. 2 Initial Population 26
4. 1. 3 Fitness Function 26
4. 1. 4 Genetic Operators 30
4.2 Proposed Algorithm, PGFMFT-MR 31
4.3 An Example of Using PGFMFT-MR 34
CHAPTER 5 Parallel FP-Growth for Fuzzy Mining with MapReduce (PFPGFM-MR) 46
5.1 Problem Statement and Definitions 47
5.2 The Proposed PFPGFM-MR 48
5.3 An Example of Using PFPGFM-MR 51
CHAPTER 6 Experimental Evaluation 62
6.1 Experimental Datasets 62
6.2 Experimental Results of EDPFM-MR 62
6.3 Experimental Results of PGFMFT-MR 65
6.4 Experimental Result of PFPGFM-MR 67
CHAPTER 7 Conclusion 71
References 73

[1]Alejandro Peña-Ayala, “Educational data mining: a survey and a data mining-based analysis of recent works,” Expert Systems with Applications, vol. 41, no. 4, pp. 1432-1462, 2014.
[2]Chun-Wei Tsai, Chin-Feng Lai, Ming-Chao Chiang and Laurence T. Yang, “Data mining for internet of things: a survey,” IEEE Communications Surveys and Tutorials, vol. 16, no. 1, pp. 77-79, 2014.
[3]Sanjeev Pippal, Lakshay Batra, Akhila Krishna, Hina Gupta andKunal Arora, “Data mining in social networking sites: a social media mining approach to generate effective business strategies,” International Journal of Innovations and Advancement in Computer Science, vol. 3, no. 2, pp. 22-27, 2014.
[4]Ramakrishnan Srikant and Rakesh Agrawal, “Mining quantitative association rules in large relational tables,” ACM Special Interest Group on Management of Data, vol. 25, no. 2, pp. 1-12, 1996.
[5]Huizhen Liu, Shangping Dai and Hong Jiang, “Quantitative association rules mining algorithm based on matrix,” IEEE International Conference on Computational Intelligence and Software Engineering, pp. 1-4, 2009.
[6]Tzung-Pei Hong and Chai-Ying Lee, “Induction of fuzzy rules and membership functions from training examples,” Fuzzy Sets and Systems, vol. 84, no. 2, pp. 33-47, 1996.
[7]Tzung-Pei Hong, Chan-Sheng Kuo and Sheng-Chai Chi, “Mining association rules from quantitative data,” Intelligent Data Analysis, vol. 3, no. 5, pp. 363-376, 1999.
[8]Tzung-Pei Hong, Kuei-Ying Lin and Been-Chian Chien, “Mining fuzzy multiple-level association rules from quantitative data,” Applied Intelligence, vol. 18, no.1, pp. 79-90, 2003.
[9]Hai Jin, Jianhua Sun, Hao Chen and Zongfen Han, “A fuzzy data mining based intrusion detection model,” IEEE International Workshop on Future Trends of Distributed Computing Systems, pp. 191-197, 2004.
[10]Tzung-Pei Hong and Tsung-Ching Lin, “Mining complete fuzzy frequent itemsets by tree structures,” IEEE International Conference on Systems Man and Cybernetics, pp. 563-567, 2010.
[11]Mehmet Kaya and Reda Alhajj, “A clustering algorithm with genetically optimized membership functions for fuzzy association rules mining,” IEEE International Conference on Fuzzy Systems, vol. 2, pp. 881-886, 2003.
[12]Tzung-Pei Hong, Chun-Hao Chen, Yu-Lung Wu and Yeong-Chyi Lee, “A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions,” Soft Computing, vol. 10, no. 11, pp. 1091-1101, 2006.
[13]Chun-Hao Chen, Tzung-Pei Hong and Vincent Shin-Mu Tseng, “A modified approach to speed up genetic-fuzzy data mining with divide-and-conquer strategy,” IEEE Evolutionary Computation, pp. 1-6, 2007.
[14]Tzung-Pei Hong and Yeong-Chyi Lee and Min-Thai Wu, “An effective parallel approach for genetic-fuzzy data mining,” Expert Systems with Applications, vol. 41, pp. 655-662, 2004.
[15]Chun-Wei Lin, Tzung-Pei Hong, Wen-Hsiang Lu, “Linguistic data mining with fuzzy FP-trees,” Expert Systems with Applications, vol. 37, no. 6, pp. 4560-4567, 2010
[16]Tzung-Pei Hong, Chun-Hao Chen, Yeong-Chyi Lee and Yu-Lung Wu, “Genetic-fuzzy data mining with divide-and-conquer strategy,” IEEE Transactions on Evolutionary Computation, vol. 12, no.2, pp. 252-265, 2008.
[17]Jeffrey Dean andSanjay Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
[18]Jian Yu, Salvatore Greco, Pawan Lingras, Guoyin Wang and Andrzej Skowron, “The high-activity parallel implementation of data preprocessing based on MapReduce,” Rough Set and Knowledge Technology, vol. 6401, pp. 646-654, 2010.
[19]Kyong-Ha Lee, Yoon-Joon Lee, Hyunsik Choi, Yon Dohn Chung and Bongki Moon, “Parallel data processing with MapReduce: a survey,” ACM Special Interest Group on Management of Data, vol. 40, no. 4, pp. 11-20, 2011.
[20]Sherif Sakr, Anna Liu and Ayman G. Fayoumi, “The family of MapReduce and large-scale data processing systems,” Communications of the ACM, vol. 46, 2013.
[21]Avrilia Floratou, Jignesh M. Patel, Eugene J. Shekita and Sandeep Tata, “Column-oriented storage techniques for MapReduce,” Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 419-429, 2011.
[22]Feng Li, Beng Chin Ooi, M. Tamer Özsu and Sai Wu, “Distributed data management using MapReduce,” ACM Computing Surveys, vol. 46, no. 3, 2014.
[23]Jorge-Arnulfo Quiané-Ruiz, Christoph Pinkel, Jörg Schad and Jens Dittrich, “RAFT at work: speeding-up MapReduce applications under task and node failures,” ACM Special Interest Group on Management of Data, pp. 1225-1228, 2011.
[24]Li Liu, Eric Li, Yimin Zhang and Zhizhong Tang, “Optimization of frequent itemset mining on multiple-core processor,” International Conference on Vary Large Data Bases, pp. 1275-1285, 2007.
[25]Osmar R. Zaïane, Mahammad El-Hajj and Paul Lu, “Fast parallel association rule mining without candidacy generation,” IEEE International Conference on Data Mining, pp. 665-668, 2001.
[26]Kawuu Wei-Chen Lin and Yu-Chin Luo, “Efficient strategies for many-task frequent pattern mining in cloud computing environments, ” Journal of Knowledge-Based System, vol. 49, pp. 10-21, 2013.
[27]Haoyuan Li, Yi Wang, Dong Zhang, Ming Zhang and Edward Y. Chang, “PFP: parallel FP-growth for query recommendation,” ACM Conference on Recommender Systems, pp. 107-114, 2008.
[28]Le Zhou, Zhiyong Zhong, Jin Chang, Junjie Li, Joshua Zhexue Huang and Shengzhong Feng, “Balanced parallel FP-growth with MapReduce, IEEE Youth Conference on Information Computing and Telecommunications, pp. 243-246, 2010.
[29]Alexandre Parodi and Pierre Bonelli, “A new approach of fuzzy classifier systems,” Proceedings of Fifth International Conference on Genetic Algorithms, pp. 223-230, 1993.
[30]Tzung-Pei Hong, Yu-Yang Liu, Min-Thai Wu and Chun-Wei Tsai “Efficient data preprocessing for genetic-fuzzy mining with MapReduce,” IEEE International Conference on Consumer Electronics - Taiwan, 2015.

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	基於Hadoop平台的雲端基因架構
2.	基因演算法之平行運算架構與實作
3.	植基於雲端技術最佳行車時間之路徑搜尋方法
4.	運用雲端運算技術偵測影響半導體製程之關鍵參數
5.	時間序列前處理與樣式探勘技術
6.	在Spark叢集運算框架下採用二次切割的頻繁項目集演算法
7.	基於MapReduce於分散式基因演算法之應用與研究
8.	行動網路運算平台之雲端決策技術研究
9.	基於雲端運算之多類分類系統

無相關期刊

1.	利用第二型隸屬函數之遺傳模糊資料探勘
2.	以聚醯亞胺基板製作雙面壓電換能器
3.	以可撓基板製作互補式電致色變元件
4.	以雙壓電層研製固態微型諧振器
5.	匿名公平P2P貸款機制
6.	含isoindigo與bodipy之施體材料於光伏元件之表現
7.	搭載能量收割及噴泉碼機制之感知無線電網路中最佳資源分配
8.	雙向式中繼點網路中考慮通道估測誤差下中繼點選擇之效能分析
9.	臺灣綠黨支持者投票考量與特性之初探
10.	適形助行器之設計與開發
11.	基於多感測器融合技術之車輛速度規劃系統設計與實現
12.	多軸運動式電解複合磨粒拋光機之研製以及不鏽鋼內孔拋光之研究
13.	齒輪參數對環式螺旋齒應力分佈之關係
14.	鎂合金板材熱間壓延之微觀組織預測
15.	以分子動力學模擬二硫化鉬奈米線機械性質與熱穩定性質

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室