研究生(外文):Hon-Kai Chang
論文名稱(外文):Study of Clustering Techniques Applied to Data Mining
指導教授(外文):Bin-Yin LiaoJeng-Shyang Pan
每種演算法都有其適合的應用範圍,因此當使用者面臨決策問題時,應當選擇根據資料的性質與欲解決之問題選擇適當的演算法,如傳統的CLARANS演算法,本身卻會因資料量不同的關係,造成執行效率上的差異,在當今數位化、電子e化的時代裡,如何達到高效率、高品質的目標,則是一門重要的課題,「時間就是金錢」不是嗎?有鑑於此,本文就以「時間」為主軸,研究出CRSM演算法,來解決效率化的問題;此外也針對了傳統密度型演算法「DBSCAN」中, 惱人的參數化問題,提出了解決之道,以CDAP演算法來改善問題之所在。
Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. In this paper, we present three novel algorithms for performing CLARANS clustering and DBSCAN clustering. First, we propose a new clustering method called CRSM, aiming at identifying spatial structures that may be present in the data. Second, building on top of DBSCAN, called CDAP algorithm, we develop a new spatial data mining algorithms aiming at discovering relationships between spatial attributes. This algorithm can discover knowledge that is easy to find with existing spatial data mining algorithms.
Our experimental results demonstrate that our scheme can improve the computational complexity of the CLARANS algorithm based on both the total number of distance calculations and the overall computation time; Especially, the proposed CDAP algorithm, can automatically estimate the two parameters of DBSCAN algorithm, so that improve the clustering performance of DBSCAN algorithm.
一、 緒論-------------------------------------------------1
1.1 研究動機---------------------------------------------1
1.2 研究目的---------------------------------------------2
1.3 研究方法---------------------------------------------3
1.4 論文架構---------------------------------------------4
二、 群聚演算法-------------------------------------------6
2.1 前言-------------------------------------------------6
2.2 群聚基本概念-----------------------------------------6
2.3 分割式群聚演算法(Partition clustering algorithm)----11
2.4 階層式群聚演算法(Hierarchical clustering algorithm)-14
2.5 CLARANS(Clustering Large Applications based on
RANdomized Search) 群聚演算法------------------------17
2.6 密度基礎群聚演算法----------------------------------21
三、 新型演算法原理--------------------------------------24
3.1 前言------------------------------------------------24
3.2 CRSM (Clustering based on Random Swap Medoids)------25
3.3 CDAP(Clustering based on Density of Adaptive
3.3.1 CDAP資料叢集演算法步驟------------------------------32
四、 研究結果--------------------------------------------37
4.1 實驗說明--------------------------------------------37
4.2 CRSM 實驗數據---------------------------------------39
4.3 CDAP實驗資料集相關說明------------------------------45
五、 結論與討論------------------------------------------58
六、 未來研究方向----------------------------------------63
6.1 前言------------------------------------------------63
6.2 平行挖掘關聯規則------------------------------------63
6.3 CRM 霍普菲爾-坦克類神經網路模式---------------------64
6.4 未來研究建議----------------------------------------66
參考文獻 ----------------------------------------------------67
附錄 類神經網路------------------------------------------70
壹、 前言------------------------------------------------70
貳、 生物神經網路與類神經網路的結構----------------------70
參、 自我組織映射網路------------------------------------73
