跳到主要內容

臺灣博碩士論文加值系統

(98.82.120.188) 您好!臺灣時間:2024/09/11 09:36
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳品文
研究生(外文):Pin-Wen Chen
論文名稱:K-means群集演算法初始化之新方法
論文名稱(外文):New Methods for the Initialization of K-means Clustering Algorithm
指導教授:許志堅許志堅引用關係
指導教授(外文):Jyh-Jian Sheu
學位類別:碩士
校院名稱:國立東華大學
系所名稱:企業管理學系
學門:商業及管理學門
學類:企業管理學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:中文
論文頁數:67
中文關鍵詞:群集分析K-means 演算法階層式演算法
外文關鍵詞:Hierarchical methodCluster AnalysisK-means algorithm
相關次數:
  • 被引用被引用:17
  • 點閱點閱:3543
  • 評分評分:
  • 下載下載:345
  • 收藏至我的研究室書目清單書目收藏:0
分群技術是資料探勘中一個重要的特徵辨識工具,被廣泛應用在不同的領域包含電腦科學、統計及生物等。近年來,研究者提出許多不同型式的分群技術,其中,相對於其它方法而言,較有效率之K-means演算法則成為分群技術中最常用且重要的方法之一,並且被廣泛使用於群集分析相關應用之中。然而,K-means雖具有效率以及簡易之特質,但也存在許多缺點,例如:雜亂資料辨識、要求使用者輸入參數和用隨機方式選取初始點成效不彰等。目前已有許多研究者試圖克服這些缺點,以增進K-means效率。
我們針對K-means演算法之初始化部分做改善,並減低雜亂資料之影響。本研究中,結合了階層式方法來改善K-means演算法隨機選取初始點之缺點。首先,我們結合網格的概念,提出了Bi-Section演算法,針對各維度做二元切割,將資料空間切割成若干區塊,並計算區塊內的相關統計資訊來決定初始群中心的分配以及是否向下遞迴地切割。第二部份,我們針對第一部分的Bi-Section演算法再進行改良,提出HBi-Section演算法,利用樹狀資料結構來計算區塊的相關資訊並且控制它的計算複雜度,期望能改善K-means演算法之效率。
Cluster analysis is an important pattern recognition tools in data mining, which is widely used in various fields such as computer science, statically analysis and biology. K-means algorithm is one of the most popular used in cluster analysis; it is also more efficient than other methods. While K-means algorithm also has several shortcomings: the selecting of initial cluster centroids has great impacts on the executive efficiency of clustering; the user has to decide the number of clusters in advance; it is very sensitive for noise or isolated data points.
This study focuses on improving the initialization of K-means algorithm, and tries to reduce the effect of noise data upon the clustering results. We combine the concept of hierarchical method and grid method to improve K-means algorithm. In the first part of this study, we propose a new algorithm: Bi-Section. Bi-Section algorithm will first bisect each dimension of data space, and thus the data space is divided into 2d parts. Then Bi-Section will compute the statistical information for each part to decide the allocation of the number of initial cluster centroids. In the second part, we propose the HBi-Section algorithm, which is based on Bi-Section. HBi-Section algorithm will build a tree structure to quickly compute the statistical information for each of the 2d parts. Thus, we can obtain an efficient improved K-means algorithm.
摘要................................................................................................................ I
Abstract .......................................................................................................... II
致謝..............................................................................................................III
目錄..............................................................................................................IV
圖目錄..........................................................................................................VI
表目錄........................................................................................................ VII
第一章 緒論...............................................................................................1
1.1 研究背景.........................................................................................................1
1.2 研究動機及目的.............................................................................................1
1.3 研究架構.........................................................................................................2
第二章 文獻回顧......................................................................................5
2.1 資料分群演算法介紹.....................................................................................5
2.1.1 分割式(Partitioning)演算法....................................................................5
2.1.2 階層式(Hierarchical)演算法...................................................................7
2.1.3 密度基礎式(Density-based)演算法......................................................10
2.1.4 網格基礎式(Grid-based)演算法...........................................................12
2.2 K-means 分群演算法..................................................................................15
2.2.1 K-means 演算法之簡介........................................................................16
2.2.2 K-means 優缺點及改善........................................................................20
2.2.3 K-means 初始化方法介紹....................................................................21
第三章 Bi-Section 與 HBi-Section 演算法...........................................23
3.1 研究流程.....................................................................................................23
3.2 Bi-Section 方法............................................................................................23
3.3 HBi-Section 方法.........................................................................................26
第四章 實驗結果與分析........................................................................31
4.1 資料設計.....................................................................................................31
4.2 實驗結果.....................................................................................................35
4.3 分析比較.......................................................................................................50
4.3.1 效率.........................................................................................................50
4.3.2 正確性.....................................................................................................53
4.3.3 穩健性.....................................................................................................57
第五章 結論與未來展望........................................................................61
5.1 結論.............................................................................................................61
5.2 未來展望.....................................................................................................62
參考文獻......................................................................................................63
[ABK99] M.Ankerst, M.M. Breuning, H. P. Kriegel, and J. Sander, “OPTICS: ordering points to identify the clustering structure, ” ACM SIGMOD Record, vol.28, issue 2, pp.49-60,1999.
[AR96] M.B. Al-Daoud, and S.A. Roberts, “New method for the initialization of clusters,” Pattern Recognition Letters, vol. 17, pp. 451-455, 1996.
[ARS98] K. Alsabti, S. Ranka and V. Sigh, “An Efficient K-means Clustering algorithm,” in Proc. 1st Workshop on High Performance Data Mining, Orlando, FL, 1998.
[BFR98] P.S. Bradley, U. Fayyad, and C. Reina, “Scaling Clustering Algorithms to Large Databases,” in Proc. 4th International Conference on Knowledge Discovery and Data Mining, Menlo Park, Calif, 1998.
[CCL04] J.S. Chen, R.K.H. Ching, and Y.S. Lin, “An extended study of the K-means algorithm for data clustering and its applications,” The Journal of the Operational Research Society, vol. 55, no.9, pp.976-987, 2004.
[Che03] Y.M. Cheung, “K-Means: A New Generalized K-means Clustering Algorithm,” Pattern Recognition Letters, vol. 24, issue 1, pp. 2883-2893, 2003.
[CJ02] J.W. Chang, and D.S. Jin, “A New Cell-Based Clustering Method for Large, High-Dimensional Data in Data Mining Applications,” in Proc. ACM Symposium on Applied Computing, Madrid, Spain, 2002.
[CL07] K.L. Chung and J.S. Lin, “Fast and more robust point symmetry-based K-means algorithm,” Pattern Recognition, vol. 40, pp. 410-422, 2007.
[Cla02] D.A. Clausi, “K-means Iterative Fisher (KIF) unsupervised clustering algorithm applied to image texture segmentation,” Pattern Recognition, vol. 35, no. 9, pp. 1959-1972, 2002.
[EKSX96] M. Ester, H.P. Kriegel, J. Sander, and X. Xu,“Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in Proc. 1996 Internation Conf. Knowledge Discovery and Data Mining, pp. 226-231, 1996.
[FV06] P. Franti, and O. Virmajoki, “Iterative shrinking method for clustering problems,” Pattern Recognition, vol. 39, no. 5, pp. 761-775, 2006.
[GRS98] S. Guha, S., R. Rastogi, and K. Shim, “CURE: An Efficient Clustering Algorithm for Large Databases,” in Proc. ACM SIMOD Conference on Management of Data, Seattle, WA, USA, pp.73-84, 1998.
[GRS99] S. Guha, R. Rastogi, and K. Shim, “ROCK: A Robust Clustering Algorithm for Categorical Attributes, ” in Proc. 15th International Conference on Data Engineering, pp.512, 1999.
[HLT04] J. He, M. Lan, C.L. Tan, S.Y. Sung, and H.B. Low, “Initialization of Cluster Refinement Algorithms: A Review and Comparative Study,” in Proc. International Joint Conference on Neural Networks, pp.297-302, 2004.

[JCH02] I. Jonyer, D.J. Cook, and L.B. Holder, “Graph-Based Hierarchical Conceptual Clustering,” The Journal of Machine Learning Research, vol. 2, pp.19-43, 2002.
[JGA06] R. Jinm, A. Goswami, and G. Agrawal, “Fast and exact out-of-core and distributed k-means clustering,” The Journal of Knowledge and Information Systems, vol.10, issue 1, pp. 17-40, 2006.
[KF90]L. Kaufman and P.J. Rousseeuw, “Finding Groups in Data: An Introduction to Cluster Analysis,” John Wiley & Sons, 1990.
[KHK99] G. Karypis, E.H. Han, and V. Kunmar, “Chameleon: Hierarchical Clustering Using Dynamic Modeling,” IEEE Computer Society, vol. 32, no. 8, pp.68-75,1999.
[MZ04] D. Ma, and A. Zhang, “An adaptive Density-Based Clustering Algorithm for Spatial Database with Noise, ” in Proc. 4th IEEE International Conference on Data Mining, Brighton, UK, pp. 467-470, 2004.
[NG94] R.T. Ng, and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," in Proc 20th International Conference on Very Large Data Bases, Santiago, 1994
[PDN04] D.T. Pham, S.S. Dimov, and C.D. Nguyen, “An Incremental K-means Algorithm,” in Proc. I MECH E Part C Journal of Mechanical Engineering Science, vol. 218, no. 7, pp. 783-795, 2004.

[PLL99] J.M Pena, J.A . Lozano, and P. Larranaga, “An empirical comparison of four initialization methods for the k-means algorithm,” Pattern Recognition Letters, vol. 20, pp.1027-1040, 1999.
[PS05] A.H. Pilevar, and M. Sukumar, “GCHL: A grid-clustering algorithm for high-dimensional very large spatial data bases,” Pattern Recognition, vol. 26, issue. 7, pp. 999-1010, 2005.
[SC01] M.C. Su, and C.H. Chou, “A Modified Version of the K-Means Algorithm with a Distance Based on Cluster Symmetry,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, no. 6, pp.674-680, 2001.
[SCZ00] G.. Sheikholeslami, S. Chatterjee, and A. Zhang, “WaveCluster: a Wavelet-Based Clustering Approach for Spatial Data in Very Large Databases,” The International Journal on Very Large Data Bases, vol. 8, issue 3-4, pp.289-394, 2000.
[TL99] E.W. Tyree, and J.A. Long, “The use of linked line segments for cluster representation and data reduction,” Pattern Recognition Letters, vol. 20, issue 1, pp. 21-29, 1999.
[WYM97] W. Wang, J. Yang, and R. Muntz, “STING: A Statistical Information Grid Approach to Spatial Data Mining,” in Proc. 23rd VLDB Conference, Athens, Greece, pp.186-195, 1997.


[ZRL96] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” in Proc. ACM SIGMOD Conference on Management of Data, Montreal, Canada, pp.103-114, 1996.
[ZRL97] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH:A New Data Clustering Algorithm and Its Applications,” Data Mining and Knowledge Discovery, pp.141-181, 1997.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊