跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/06 18:21
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張紘愷
研究生(外文):Hon-Kai Chang
論文名稱:應用分群技術於資料探勘之研究
論文名稱(外文):Study of Clustering Techniques Applied to Data Mining
指導教授:廖斌毅潘正祥
指導教授(外文):Bin-Yin LiaoJeng-Shyang Pan
學位類別:碩士
校院名稱:國立高雄應用科技大學
系所名稱:電子與資訊工程研究所碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:中文
論文頁數:76
中文關鍵詞:群聚演算法分群演算法
外文關鍵詞:ClusterCRSMCDAP
相關次數:
  • 被引用被引用:31
  • 點閱點閱:947
  • 評分評分:
  • 下載下載:113
  • 收藏至我的研究室書目清單書目收藏:4
資料叢集的相關研究一直是最近幾年來相當重要資料探勘研究領域的研究課題,資料叢集問題為考慮一個資料集,其中每筆資料有數個不同的屬性,屬性的數目稱為維度,叢集演算法為根據屬性資料,將資料集適切的分割為數個子集合,即是所謂的叢集。其特性為同一個叢集內的物件儘可能的相同,反之不同叢集間的物件則要儘可能的不同。
每種演算法都有其適合的應用範圍,因此當使用者面臨決策問題時,應當選擇根據資料的性質與欲解決之問題選擇適當的演算法,如傳統的CLARANS演算法,本身卻會因資料量不同的關係,造成執行效率上的差異,在當今數位化、電子e化的時代裡,如何達到高效率、高品質的目標,則是一門重要的課題,「時間就是金錢」不是嗎?有鑑於此,本文就以「時間」為主軸,研究出CRSM演算法,來解決效率化的問題;此外也針對了傳統密度型演算法「DBSCAN」中, 惱人的參數化問題,提出了解決之道,以CDAP演算法來改善問題之所在。
因此,我們使用群聚分析法結合最佳化方式將資料組織化並提供一種新的方法,這對分群化之後的資料而言,分成同一群組所包含資料之間有最佳化的關聯性,以降低不必要的浪費,提供最大的效益。本研究希望能夠藉由對於幾個重要群聚演算法的研究與探討,找出解決群聚參數問題與最佳化的選擇的方法以克服相對的缺點來達成兼顧速度與效果的叢集分析。
Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. In this paper, we present three novel algorithms for performing CLARANS clustering and DBSCAN clustering. First, we propose a new clustering method called CRSM, aiming at identifying spatial structures that may be present in the data. Second, building on top of DBSCAN, called CDAP algorithm, we develop a new spatial data mining algorithms aiming at discovering relationships between spatial attributes. This algorithm can discover knowledge that is easy to find with existing spatial data mining algorithms.
Our experimental results demonstrate that our scheme can improve the computational complexity of the CLARANS algorithm based on both the total number of distance calculations and the overall computation time; Especially, the proposed CDAP algorithm, can automatically estimate the two parameters of DBSCAN algorithm, so that improve the clustering performance of DBSCAN algorithm.
一、 緒論-------------------------------------------------1
1.1 研究動機---------------------------------------------1
1.2 研究目的---------------------------------------------2
1.3 研究方法---------------------------------------------3
1.4 論文架構---------------------------------------------4
二、 群聚演算法-------------------------------------------6
2.1 前言-------------------------------------------------6
2.2 群聚基本概念-----------------------------------------6
2.3 分割式群聚演算法(Partition clustering algorithm)----11
2.4 階層式群聚演算法(Hierarchical clustering algorithm)-14
2.5 CLARANS(Clustering Large Applications based on
RANdomized Search) 群聚演算法------------------------17
2.6 密度基礎群聚演算法----------------------------------21
三、 新型演算法原理--------------------------------------24
3.1 前言------------------------------------------------24
3.2 CRSM (Clustering based on Random Swap Medoids)------25
3.3 CDAP(Clustering based on Density of Adaptive
Parameter)概述-------------------------------------31
3.3.1 CDAP資料叢集演算法步驟------------------------------32
四、 研究結果--------------------------------------------37
4.1 實驗說明--------------------------------------------37
4.2 CRSM 實驗數據---------------------------------------39
4.3 CDAP實驗資料集相關說明------------------------------45
五、 結論與討論------------------------------------------58
六、 未來研究方向----------------------------------------63
6.1 前言------------------------------------------------63
6.2 平行挖掘關聯規則------------------------------------63
6.3 CRM 霍普菲爾-坦克類神經網路模式---------------------64
6.4 未來研究建議----------------------------------------66
參考文獻 ----------------------------------------------------67
附錄 類神經網路------------------------------------------70
壹、 前言------------------------------------------------70
貳、 生物神經網路與類神經網路的結構----------------------70
參、 自我組織映射網路------------------------------------73
1.Ankerst M., Breunig M., Kriegel H.P. and Sander J. , June 1999, “OPTICS: Ordering points to identify the clustering structure,” In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’99), pp. 49-60, Philadelphia, PA.
2.Agrawal R., Gehrke J., Gunopulos D. and Raghavan P. , 1998., “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” Int. Conf. Management of Data, pp. 94-105, Seattle, Washington
3.Agrawal R., Mannila H., Srikant R., Toivonen H. and Verkamo A. I. , 1995, “Fast discovery of association rules,” Advances in Knowledge Discovery and Data Mining, chapter12, pp. 307-328,AAAI/MIT Press.
4.Cheng C. H., Fu A.W. and Zhang Y. , August 1999, ”Entropy-based subspace clustering for mining numerical data,” Int. Conf. Knowledge Discovery and Data Mining (KDD’99), pp. 84-93, San Diego, CA, USA.
5.Dash M., Liu M. and Xu X. , April 2001, “''1+1>2'''': Merging Distance and Density Based Clustering,” In Proc. 7th Int. Conf. Database Systems for Advanced Applications (DASFAA''''01), pp. 18-20, Hong Kong.
6.Ester M., Kriegel H.P., Sander J., Wimmer M. and Xu X., Aug.1998, “Incremental Clustering for Mining in a Data Warehousing Environment,” In Proc. 24th Int. Conf. Very Large Databases (VLDB''''98), pp. 24 - 27, New York City, NY, USA,.
7.Ester M., Kriegel H.P. , Aug. 1996, Sander J. and Xu X., “Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD’96), pp. 226-231, Portland, OR.
8.Estivill-Castro V. and Lee I. , Aug. 2000, “AMOEBA: Hierarchical Clustering Based on Spatial Proximity Using Delaunay Diagram,” In Proc. 9th Int. Spatial Data Handling (SDH2000), pp. 10-12, Beijing, China.
9.Estivill-Castro V. and Lee I. ,Aug. 2000, “AUTOCLUST: Automatic Clustering via Boundary Extraction for Massive Point Data Sets,” In Proc. 5th Int. Conf. Geo-Computation, pp. 23-25, University of Greenwich, Kent, UK.
10.Guha S., Rastogi R. and Shim K. , June 1998, “CURE: An efficient clustering algorithm for large databases,” In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pp. 73-84, Seattle, WA.
11.Guha S., Rastogi R. and Shim K. , Mar. 1999, “ROCK: A Robust Clustering Algorithm For Categorical Attribute,” In Proc. 1999 Int. Conf. Data Engineering (ICDE’99), pp. 512-521, Sydney, Australia.
12.Han J. and Kamber M., , 2000 “Data Mining: Concepts and Techniques,” Morgan Kaufmann.
13.Huang Z. , (1998), "Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values," Data Mining and Knowledge Discovery, Vol. 2, pp. 283-304.
14.Karypis G., Han E.H. and Kumar V. , 1999, “CHAMELEON: Hierarchical Clustering Using Dynamic Modeling,” IEEE Computer, Vol. 32, No. 8, pp. 68-75.
15.Kaufman L. and Rousseeuw PJ, 1990, “Finding Groups in Data: an Introduction to Cluster Analysis,” John Wiley & Sons.
16.MacQueen J. , 1967, “Some Methods for Classification and Analysis of Multivariate Observations,” In Proc. 5th Berkeley Symp. Math. Stat. and Prob., Vol. 1, pp. 281-297.
17.Raymond T. Ng and Jiawei Han. , Sept. 2002, “ CLARANS:A Method for Clustering Objects for Spatial Data Mining,” IEEE Computer Society, Vol. 14, No. 5, pp. 1003-1016.
18.Sheikholeslami G., Chatterjee S., and Zhang A. , Aug. 1998, “WaveCluster: A multi-resolution clustering approach for very large spatial databases,” In Proc. 1998 Int. Conf. Very Large Databases (VLDB’98), pp. 428-439, New York.
19.Sander J., Ester M., Kriegel H.P. and Xu X. , 1998, “Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and its Applications,” In Proc. Int. Conf. Data Mining and Knowledge Discovery, Kluwer Academic Publishers, Vol. 2, No. 2.
20.Wang W., Yang and Muntz R. , Aug. 1997, “STING: A Statistical Information grid Approach to Spatial Data Mining,” In Proc. 1997 Int. Conf. Very Large Data Bases(VLDB’97), pp. 186-195, Athens, Greece.
21.Xu X., Ester M., Kriegel H.P. and Sander J. , 1998, “A distribution— based Clustering Algorithm for Mining in Large Spatial Databases," In Proc. 14th Int. Conf. Data Engineering (ICDE''''98), Orlando, Florida, USA.
22.Zhang T., Ramakrishnan R. and Livny M. , 1996, “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’96), pp. 103-114.
23.Kohonen, T. , 1996, “Self-organizing maps: Optimization approaches,“ in Artificial Neural Networks, T. Kohonen, K. Makisara, O. Simula, and J. Kangas, Eds. Amsterdam,Bigus, J.P., Data Mining with Neural Networks, McGraw-Hill Company.
24.K. Alsabti, S. Ranka, and V. Singh. “An Efficient K-Means Clustering Algorithm”. http://www.cise.ufl.edu/_ ranka/, 1997.
25.T. H. Cormen, C. E. Leiserson, and R. L. Rivest. ”Introduction to Algorithms.” McGraw-Hill Book Company, 1990.
26.R. C. Dubes and A. K. Jain. “Algorithms for Clustering Data.” Prentice Hall, 1988.
27.J. Garcia, J. Fdez-Valdivia, F. Cortijo, and R. Molina. ”Dynamic Approach for Clustering Data”. Signal Processing , 44:(2), 1994.
28.V. Ramasubramanian and K. Paliwal. “Fast K-Dimensional Tree Algorithms for Nearest Neighbor Search with Application to Vector Quantization Encoding.” IEEE Transactions on Signal Processing, 40:(3), March 1992.
29.R. Laurini and D. Thompson, “Fundamentals of Spatial Information Systems.” Academic Press, 1992.
30.W. Lu, J. Han, and B. Ooi, “Discovery of General Knowledge in Large Spatial Databases,” Proc. Far East Workshop Geographic Information Systems, pp. 275—289, 1993.
31.G. Milligan and M. Cooper, “An Examination of Procedures for Determining the Number of Clusters in a Data Set,” Psychometrika, vol. 50, pp. 159—179, 1985.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top