跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.176) 您好!臺灣時間:2025/09/06 23:31
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:李櫂安
研究生(外文):LI, CHAO-AN
論文名稱:基於連通域標示法之分群演算法
論文名稱(外文):A Clustering Algorithm based on Connected Component Labeling
指導教授:練光祐
指導教授(外文):LIAN, KUANG-YOW
口試委員:阮議聰邱謙松曾傳蘆黃正民
口試委員(外文):JUAN, YEE-JSONGCHIU, CHIAN-SONGTSENG, CHWAN-LUHUANG, CHENG-MING
口試日期:2019-07-25
學位類別:碩士
校院名稱:國立臺北科技大學
系所名稱:電機工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:64
中文關鍵詞:非監督式分群演算法資料區域化數值二值化連通域標示法
外文關鍵詞:Unsupervised Clustering AlgorithmData RegionalizationThresholdingConnected Component Labeling
相關次數:
  • 被引用被引用:0
  • 點閱點閱:203
  • 評分評分:
  • 下載下載:4
  • 收藏至我的研究室書目清單書目收藏:0
本文為改進既有的非監督式分群演算法之缺點,提出一套新穎的分群演算法。傳統分割式演算法透過資料點與群聚中心點之間的距離來進行分群,但不少資料型態皆並不適用於此方法。另種常用的密度式演算法則是透過預設距離資料點半徑長度以內的鄰近區域,判斷該區域中是否包含了預設的最少資料點數,若符合,則該區域為同一群聚;若否,該資料點則為離群點。但如果事先不知道資料的型態,將會很難決定半徑長度以及範圍內最少資料點的個數。本文以連通域標示法(Connected Component Labeling)之概念,提出一種創新的分群演算法。其方法主要是找出資料群聚中密度的核心,透過資料區域化、資料二值化等統計學的方法處理,並透過連通域標示法可以有效去除面積較小的連通區域。不再是使用群聚中心點進行資料點距離計算,而是使用連通區域的所有座標點進行距離計算,並將連通區域外的資料點分配到鄰近的區域。因此不管資料集合是非凸形資料分布,或是被離群點壓縮,或是群聚間彼此資料佔的區域差異極大,或是群聚間有極多資料點分布稀疏的情形下,都可以成功分配至正確的群聚中。
In order to improve the shortcomings of existing unsupervised clustering algorithms, this thesis proposes a novel clustering algorithm. The partitional clustering algorithm performs grouping by the distance between the data points and the cluster center points, but many of the data are not applicable to this method. The density clustering algorithm determines whether the area contains the preset minimum number of data points by using a neighboring area within a radius of the preset distance of the data point. If they match, the area is the same cluster. If not, the data point is a discrete point. However, if you do not know the type of the data beforehand, it will be difficult to determine the length of the radius and the number of minimum data points in the range. This thesis proposes an innovative clustering algorithm based on the concept of connected component labeling. The method is mainly to find out the core of the medium density of data clustering, and to deal with the statistical methods such as data regionalization and thresholding, and to effectively remove the connected components with small area through the connected component labeling. Instead of using the cluster center point for data point distance calculation, all the coordinate points of the connected component are used for distance calculation. Assign data points outside the connected component to components with smaller distances. Therefore, this method can be used regardless of whether the data set is a non-convex data distribution, or is compressed by outliers, or the area between the clusters is extremely different, or there are many data points distributed between the clusters. The data is grouped into the correct clusters.
摘要 i
ABSTRACT ii
誌謝 iii
目錄 iv
圖目錄 vii
表目錄 xi
第一章 緒論 1
1.1 研究動機 1
1.2 研究目的 8
1.3 相關文獻回顧 8
1.4 分群演算法基本概念 11
1.5 研究貢獻 11
1.6 論文架構 11
第二章 相關技術 13
2.1 分群演算法 13
2.1.1 K-means分群演算法 13
2.1.2 PAM分群演算法 16
2.1.3 CLARA分群演算法 18
2.1.4 DBSCAN分群演算法 18
2.1.5 OPTICS 分群演算法 21
2.2 均值濾波器(Mean Filter) 23
2.3 大津演算法(Otsu's method) 25
2.4 連通域標示法(Connected Component Labeling) 26
第三章 系統流程 28
3.1 整體架構 28
3.2 簡化流程 30
第四章 連通分群演算法 32
4.1 資料正規化(Data Normalization) 32
4.2 資料區域化(Data Regionalization) 33
4.3 濾除離群點(Remove Outlier by Mean Filter) 36
4.4 資料二值化(Thresholding) 37
4.5 連通域標示法(Connected Component Labeling) 39
4.6 資料分配(Assign Each Data to the Near Region) 40
4.7 區域合併(Data Close to Each Region are Merged) 41
第五章 系統優化 43
5.1 改善記憶體不足的問題 43
5.2 改善時間過長的問題 44
5.2.1 找出每個維度適當的間隔數目 44
5.2.2 省略濾除離群點的步驟 46
第六章 實驗結果 48
6.1 二、三維資料 48
6.2 高維度資料 55
6.2.1 實驗數據資料庫 55
6.2.2 實驗流程 55
6.2.3 實驗結果 57
第七章 結論與未來展望 62
7.1 結論 62
7.2 未來展望 62
參考文獻 63
[1]J. Grabmeier and A. Rudolph, “Techniques of cluster algorithms in data mining,” Data Mining knowledge discovery, vol. 6, no. 4, pp. 303-360, 2002.
[2]A. P. Reynolds, G. Richards, and V. J. Rayward-Smith, “The application of k-medoids and pam to the clustering of rules,” in International Conference on Intelligent Data Engineering and Automated Learning, pp. 173-178, 2004.
[3]陳榮昌和林育臣,「群聚演算法及群聚參數的分析與探討」,朝陽學報,no. 8_1, pp. 327-353, 2003.
[4]M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Kdd, pp. 226-231, 1996.
[5]M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “OPTICS: ordering points to identify the clustering structure,” in ACM Sigmod record, pp. 49-60, 1999.
[6]N. Arbin, N. S. Suhaimi, N. Z. Mokhtar, and Z. Othman, “Comparative analysis between K-means and K-medoids for statistical clustering,” in 2015 3rd International Conference on Artificial Intelligence, Modelling and Simulation (AIMS), pp. 117-121, 2015.
[7]W. S. Manjoro, M. Dhakar, and B. K. Chaurasia, “Operational analysis of k-medoids and k-means algorithms on noisy data,” in 2016 International conference on communication and signal processing (ICCSP), pp. 1500-1505, 2016.
[8]H. K. Kanagala and V. J. R. Krishnaiah, “A comparative study of K-Means, DBSCAN and OPTICS,” in 2016 International Conference on Computer Communication and Informatics (ICCCI), pp. 1-6, 2016.
[9]M. Verma, M. Srivastava, N. Chack, A. K. Diswar, and N. Gupta, “A comparative study of various clustering algorithms in data mining,” International Journal of Engineering Research Applications, vol. 2, no. 3, pp. 1379-1384, 2012.
[10]N. Shental, A. Bar-Hillel, T. Hertz, and D. Weinshall, “Computing Gaussian mixture models with EM using equivalence constraints,” in Advances in neural information processing systems, pp. 465-472, 2004.
[11]S. Sengupta, S. Basak, and R. A. Peters, “Data Clustering using a Hybrid of Fuzzy C-Means and Quantum-behaved Particle Swarm Optimization,” in 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), pp. 137-142, 2018.
[12]J. Sun, W. Xu, and B. Feng, “A global search strategy of quantum-behaved particle swarm optimization,” in IEEE Conference on Cybernetics and Intelligent Systems, 2004., pp. 111-116, 2004.
[13]J. C. Bezdek, “Pattern recognition with fuzzy objective function algorithms,” in Book Pattern recognition with fuzzy objective function algorithms. Springer Science & Business Media, 2013.
[14]R. Eberhart and J. Kennedy, “Particle swarm optimization,” in Proceedings of the IEEE international conference on neural networks, pp. 1942-1948, 1995.
[15]J. Liang, X. Zhou, Y. Sha, P. Liu, L. Guo, and S. Bai, “Unsupervised Clustering Strategy Based on Label Propagation,” in 2013 IEEE 13th International Conference on Data Mining Workshops, pp. 788-794, 2013.
[16]X. Zhu and Z. Ghahramani, “Learning from labeled and unlabeled data with label propagation,” 2002.
[17]S. Gregory, “Finding overlapping communities in networks by label propagation,” New Journal of Physics, vol. 12, no. 10, p. 103018, 2010.
[18]J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp. 281-297, 1967.
[19]影像平滑化 - 維基百科,Avaiable: https://zh.wikipedia.org/wiki/%E5%BD%B1%E5%83%8F%E5%B9%B3%E6%BB%91%E5%8C%96
[20]N. Otsu, “A threshold selection method from gray-level histograms,” IEEE transactions on systems, man, cybernetics, vol. 9, no. 1, pp. 62-66, 1979.
[21]M. Sezgin and B. Sankur, “Survey over image thresholding techniques and quantitative performance evaluation,” Journal of Electronic imaging, vol. 13, no. 1, pp. 146-166, 2004.
[22]Connected-component labeling - Wikipedeia, Avaiable: https://en.wikipedia.org/wiki/Connected-component_labeling
[23]UCI Machine Learning Repository, Avaiable: https://archive.ics.uci.edu/ml/index.php
[24]G. Lanaro, Evaluation measures for multiclass problems, Avaiable: http://gabrielelanaro.github.io/blog/2016/02/03/multiclass-evaluation-measures.html
[25]陳孟廷,「基於低解析度紅外線陣列感測訊號之睡眠評估系統」,碩士論文,國立台北科技大學電機工程研究所,2018。
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top