研究生(外文):Zhao-yu Wang
論文名稱(外文):Some Variants of Self-Constructing Clustering
指導教授(外文):Shie-Jue Lee
外文關鍵詞:training cycleiterative computingquadratic programmingz-distancedata clusteringdata miningself-constructing clustering
Lee & Ouyang 在2003年發展了自建構式分群法(SCC),其不需要由使用者預先給予分群數目,而SCC在演算法中只會對整個資料集進行一輪的運算。然而,一旦資料點被分配到一群後,即不會再做更動,這樣的行為可能會造成配置上的錯誤以及使所產生的分群受到資料輸入順序的影響。另一方面,在距離計算上,每一維度的權重都相同,在特定的應用當中可能會不大適合。
Self-constructing clustering (SCC) was proposed, in which the number of clusters is not required to be specified in advance by the user. For a given set of instances, SCC performs only one training cycle on the instances, so it is fast. However, once an instance has been assigned to a cluster, the assignment won’t be changed afterwards. The clusters produced may depend on the sequence the instances are considered, and assignment errors are more likely to occur. Also, all dimensions are equally weighted, which may not be suitable in certain applications. In this paper, two improved versions of SCC, SCC-I and SCC-IW, are proposed. SCC-I allows two or more training cycles on the instances to be performed. An instance can be re-assigned to another cluster in each cycle. A desired number of clusters is obtained when no assignment has been changed in the current cycle. In this way, the clusters produced are less likely to be affected by the feeding sequence of the instances. SCC-IW is an extension of SCCI, allowing each dimension to be weighted differently in the clustering process. The values of the weights are adaptively learned from the data. This is useful when certain relevance exists between different dimensions. A number of experiments with real world benchmark datasets are conducted and the results are shown to demonstrate the effectiveness of the proposed ideas.
致謝 i
摘要 ii
Abstract iii
圖目錄 vi
表目錄 vii
第一章 導論 1
1.1. 研究背景 1
1.2. 研究動機與目的 3
1.3. 論文架構 4
第二章 文獻探討 5
2.1. 自建構式分群法( Self-Constructing Clustering , SCC ) 5
2.2. 結合時間權重之K-means ( TSKmeans ) 7
第三章 迭代式自建構分群法 10
3.1. 方法介紹 10
3.2. 範例 12
第四章 迭代式結合權重之自建構分群法 21
4.1. 動機說明 21
4.2. 方法介紹 22
4.3 範例 26
第五章 實驗結果與討論 30
5.1. 多類別資料集 31
5.2. 時間序列資料集 35
5.3. 參數α之影響與討論 39
第六章 結論與未來展望 42
參考文獻 43
附錄 49
