跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.152) 您好!臺灣時間:2025/11/02 12:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳奕廷
研究生(外文):CHEN, YI-TING
論文名稱:基於MapReduce架構下改善K-means演算法
論文名稱(外文):An Improved K-means Algorithm based on MapReduce Framework
指導教授:賴國華
指導教授(外文):K. Robert Lai
口試委員:周志岳許嘉裕劉晨鐘藍中賢
口試委員(外文):Chih-Yueh ChouChia-Yu HsuChen-Chung LiuChung-Hsien Lan
口試日期:2016-06-28
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:中文
論文頁數:35
中文關鍵詞:資料探勘群集分析K-meansMapReduce
外文關鍵詞:Data miningCluster analysisK-meansMapReduce
相關次數:
  • 被引用被引用:0
  • 點閱點閱:304
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在資料不斷的被搜集、累積,資料探勘被運用在大數據中,在大量資料中找出關聯性,並挖掘隱藏其中的資訊。資料探勘中的群集分析透過分群使資料簡化並分析。本篇論文研究K-means演算法的問題並加以改善,K-means演算法有一些缺點,必須決定有多少群集數量的K值、隨機產生的初始點、以及在大量資料下處理速度過慢或者無法處理的問題。為了解決上述問題,我們提出基於MapReduce架構下改善K-means演算法,使用聚合式階層分群法產生初始點用來改善隨機產生初始點的問題,以及使用側影係數來選擇最佳的K群數量。本研究結果顯示有自動選擇最佳的K值、有穩定產生初始點以及處理大量的資料。
As the data is collected and accumulated, data mining is used in big data analysis to find the relevance of huge amount of data and dig the information behind. Cluster analysis in data mining simplifies and analytics data. In this paper, we will research the problems of K-means algorithm and improve it. There are some disadvantages using K-means algorithm. The users need to determine the K value for a number of clusters, then generate the starting point randomly. In addition, the processing speed could be slow or some problems could not be fixed while dealing with huge amount of data. In order to solve these problems, we propose a method based on MapReduce framework to improve the K-means algorithm. By using agglomerative hierarchical clustering(HAC) to generate the starting point to fix the problem of generating initial point randomly. By using Silhouette coefficient to choose the best number of K clusters. This result of this research shows that choosing the right means we can select the best K value automatically, generate the initial point stable and be able to deal with the great amount of data.
摘要 iii
ABSTRACT iv
誌謝 v
目錄 vi
表目錄 viii
圖目錄 ix
第一章、緒論 1
1.1研究背景 1
1.2研究動機 1
1.3研究目的 2
1.4研究架構 2
第二章、文獻回顧 3
2.1 K-means演算法 3
2.1.1 K-means演算法初始點的改善 3
2.1.2 K-means演算法群集數量的改善 4
2.1.3 K-means演算法運算時間的改善 5
2.2 Hadoop 7
2.2.1 Hadoop Distributed File System 7
2.2.2 MapReduce運算流程 8
2.3群集評估 10
第三章、研究方法 14
3.1演算法流程 14
3.2 MapReduce演算法流程 17
第四章、研究結果與分析 21
第五章、結論 34
參考資料 35
[1] Witten, I.H. and E. Frank, Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, 2005.
[2] Kaufman, L. and P.J. Rousseeuw, Finding groups in data: an introduction to cluster analysis, 344. John Wiley & Sons, 2009.
[3] Dean, J. and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
[4] Arai, K. and A.R. Barakbah, "Hierarchical K-means: an algorithm for centroids initialization for K-means," Reports of the Faculty of Science and Engineering, vol. 36, no. 1, pp. 25-31, 2007.
[5] Zhao, W., H. Ma, and Q. He, "Parallel k-means clustering based on mapreduce, in Cloud computing,". 2009, IEEE International Conference on Cloud Computing, pp. 674-679.
[6] Cui, X., et al., "Optimized big data K-means clustering using MapReduce," The Journal of Supercomputing, vol. 70, no. 3, pp. 1249-1259, 2014.
[7] Lin, K., et al., "A K-means clustering with optimized initial center based on Hadoop platform," International Conference on Computer Science & Education, ICCSE, Aug.2014, pp.263-266.
[8] MacQueen, J., "Some methods for classification and analysis of multivariate observations," Proceedings of the Fifth Berkeley symposium on mathematical statistics and probability, vol. 1, 1967, pp.281-297.
[9] Jain, A.K., M.N. Murty, and P.J. Flynn, "Data clustering: a review," ACM Computing Surveys, CSUR, vol. 31, no. 3, pp. 264-323, 1999.
[10] Arthur, D. and S. Vassilvitskii, "k-means++: The advantages of careful seeding," Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 2007, pp.1027-1035.
[11] Bradley, P.S. and U.M. Fayyad, "Refining Initial Points for K-Means Clustering," The International Conference on Machine Learning, vol. 98, 1998, pp.91-99.
[12] Ester, M., et al., "A density-based algorithm for discovering clusters in large spatial databases with noise," Conference on Knowledge Discovery and Data Mining, vol. 96, 1996, pp.226-231.
[13] Yuan, Q., H. Shi, and X. Zhou, "An optimized initialization center K-means clustering algorithm based on density,"IEEE International Conference onCyber Technology in Automation, Control, and Intelligent Systems,CYBER, 2015, pp.790-794.
[14] Hamerly, G. and C. Elkan, "Learning the k in A> means," Advances in Neural Information Processing Systems, vol. 16, pp. 281, 2004.
[15] Kettani, O., F. Ramdani, and B. Tadili, "AK-means: an automatic clustering algorithm based on K-means," Journal of Advanced Computer Science & Technology, vol. 4, no. 2, pp. 231-236, 2015.
[16] Debatty, T., et al., "Determining the k in k-means with MapReduce," International Conference on Extending Database Technology, 2014, pp.19-28.
[17] Ma, L., et al., "An Improved K-means Algorithm based on Mapreduce and Grid," International Journal of Grid Distribution Computing, vol. 8, no. 1, 2015.
[18] White, T., Hadoop: The definitive guide, O'Reilly Media, Inc., 2012.
[19] Rousseeuw, P.J., "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis," Journal of Computational and Applied Mathematics, vol. 20, pp. 53-65, 1987.
[20] Provost, F.J., T. Fawcett, and R. Kohavi, "The case against accuracy estimation for comparing induction algorithms," International Conference on Machine Learning,ICML, vol. 98, 1998, pp.445-453.
[21] Dunn, J.C., "A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters,"1973.
[22] Ray, S. and R.H. Turi, "Determination of number of clusters in k-means clustering and application in colour image segmentation," Proceedings of the 4th International Conference on Advances in Pattern Recognition and Digital Techniques, 1999, pp.137-143.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊