跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.171) 您好!臺灣時間:2025/01/17 10:38
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:王照宇
研究生(外文):Zhao-yu Wang
論文名稱:自建構分群演算法之研究
論文名稱(外文):Some Variants of Self-Constructing Clustering
指導教授:李錫智李錫智引用關係
指導教授(外文):Shie-Jue Lee
學位類別:碩士
校院名稱:國立中山大學
系所名稱:電機工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:中文
論文頁數:62
中文關鍵詞:資料探勘自建構式分群資料分群二次式規劃Z-距離訓練週期迭代運算
外文關鍵詞:training cycleiterative computingquadratic programmingz-distancedata clusteringdata miningself-constructing clustering
相關次數:
  • 被引用被引用:0
  • 點閱點閱:179
  • 評分評分:
  • 下載下載:4
  • 收藏至我的研究室書目清單書目收藏:0
Lee & Ouyang 在2003年發展了自建構式分群法(SCC),其不需要由使用者預先給予分群數目,而SCC在演算法中只會對整個資料集進行一輪的運算。然而,一旦資料點被分配到一群後,即不會再做更動,這樣的行為可能會造成配置上的錯誤以及使所產生的分群受到資料輸入順序的影響。另一方面,在距離計算上,每一維度的權重都相同,在特定的應用當中可能會不大適合。
本論文提出兩種SCC之改進版本,SCC-I與SCC-IW。SCC-I能夠使用兩輪以上的訓練週期並且在每次訓練週期中資料點被允許再重新分配入其他群中。而當該輪訓練週期的內容不再改變時,即達到合適群聚數,演算法終止。如此一來,所產生的群聚可以有較少的機會被資料輸入順序所影響。而SCC-IW則是SCC-I的延伸版,其再分群過程中給予每一資料維度不同的權重,而這些權重由資料中適應學習產生。這個做法在不同維度有其相關性時,是具有成效的。透過一系列不同的現實資料集實驗與其結果,可以展示出本論文提出方法之成效。
Self-constructing clustering (SCC) was proposed, in which the number of clusters is not required to be specified in advance by the user. For a given set of instances, SCC performs only one training cycle on the instances, so it is fast. However, once an instance has been assigned to a cluster, the assignment won’t be changed afterwards. The clusters produced may depend on the sequence the instances are considered, and assignment errors are more likely to occur. Also, all dimensions are equally weighted, which may not be suitable in certain applications. In this paper, two improved versions of SCC, SCC-I and SCC-IW, are proposed. SCC-I allows two or more training cycles on the instances to be performed. An instance can be re-assigned to another cluster in each cycle. A desired number of clusters is obtained when no assignment has been changed in the current cycle. In this way, the clusters produced are less likely to be affected by the feeding sequence of the instances. SCC-IW is an extension of SCCI, allowing each dimension to be weighted differently in the clustering process. The values of the weights are adaptively learned from the data. This is useful when certain relevance exists between different dimensions. A number of experiments with real world benchmark datasets are conducted and the results are shown to demonstrate the effectiveness of the proposed ideas.
致謝 i
摘要 ii
Abstract iii
圖目錄 vi
表目錄 vii
第一章 導論 1
1.1. 研究背景 1
1.2. 研究動機與目的 3
1.3. 論文架構 4
第二章 文獻探討 5
2.1. 自建構式分群法( Self-Constructing Clustering , SCC ) 5
2.2. 結合時間權重之K-means ( TSKmeans ) 7
第三章 迭代式自建構分群法 10
3.1. 方法介紹 10
3.2. 範例 12
第四章 迭代式結合權重之自建構分群法 21
4.1. 動機說明 21
4.2. 方法介紹 22
4.3 範例 26
第五章 實驗結果與討論 30
5.1. 多類別資料集 31
5.2. 時間序列資料集 35
5.3. 參數α之影響與討論 39
第六章 結論與未來展望 42
參考文獻 43
附錄 49
[1]D. L. Olson, Y. Shi, Introduction to business data mining, McGraw-Hill/Irwin Englewood Cliffs, 2007.
[2]S. Theodoridis, K. Koutroumbas, Pattern Recognition, Elsevier, 2008.
[3]W. Li, L. Jaroszewski, A. Godzik, Clustering of highly homologous sequences to reduce the size of large protein databases, Bioinformatics 17 (3) (2001) 282–183.
[4]S.-J. Lee, C.-S. Ouyang, S.-H. Du, A neuro-fuzzy approach for segmentation of human objects in image sequences, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 33 (3) (2003) 420–437.
[5]R. Filipovych, S. M. Resnick, C. Davatzikos, Semi-supervised cluster analysis of imaging data, NeuroImage 54 (3) (2011) 2185–2197.
[6]J.-Y. Jiang, R.-J. Liou, S.-J. Lee, A fuzzy self-constructing feature clustering algorithm for text classification, IEEE Transactions on Knowledge and Data Engineering 23 (3) (2011) 335–349.
[7]R.-F. Xu, S.-J. Lee, Dimensionality reduction by feature clustering for regression problems, Information Sciences 299 (2015) 42–57.
[8]M. Wang, Y. Yu, W. Lin, Adaptive neural-based fuzzy inference system approach applied to steering control, Proceedings of International Symposium on Neural Networks (2009) 1189–1196.
[9]Y. Xu, V. Olman, D. Xu, Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees, Bioinformatics 18 (4) (2002) 536–545.
[10]C.-C. Wei, T.-T. Chen, S.-J. Lee, K-nn based neuro-fuzzy system for time series prediction, Proceedings of 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (2013) 569–574.
[11]F. Can, E. A. Ozkarahan, Concepts and effectiveness of the cover-coefficient-
based clustering methodology for text databases, ACM Transactions on Database Systems 15 (4) (1990) 483–517.
[12]R. Feldman, J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2007.
[13]S.-J. Lee, J.-Y. Jiang, Multilabel text categorization based on fuzzy relevance clustering, IEEE Transactions on Fuzzy Systems 22 (6) (2014) 1457–1471.
[14]C.-L. Liao, S.-J. Lee, A clustering based approach to improving the efficiency of collaborative filtering recommendation, Electronic Commerce Research and Applications 18 (2016) 1–9.
[15]F. M. Alvarez, A. Troncoso, J. C. Riquelme, J. S. A. Ruiz, Energy time series forecasting based on pattern sequence similarity, IEEE Transactions on Knowledge and Data Engineering 23 (8) (2011) 1230–1243.
[16]Z.-Y. Wang, S.-J. Lee, A neuro-fuzzy based method for TAIEX forecasting, Proceedings of International Conference on Machine Learning and Cybernetics (ICMLC) 1 (2) (2014) 579–584.
[17]B. Everitt, Cluster analysis, Chichester, West Sussex, UK: Wiley, 2011.
[18]T. Kohonen, Self-Organizing Maps, Springer-Verlag, 1995.
[19]K. Alsabti, S. Ranka, V. Singh, An efficient k-means clustering algorithm, Electrical Engineering and Computer Science Paper 43.
[20]Z. Huang, Extensions to the k-means algorithm for clustering large data sets with categorical values, Data Mining and Knowledge Discovery 2 (1998) 283–304.
[21]S.-J. Lee, C.-S. Ouyang, A neuro-fuzzy system modeling with self-constructing rule generation and hybrid SVD-based learning, IEEE Transactions on Fuzzy Systems 11 (3) (2003) 341–353.
[22]H.-S. Park, C.-H. Jun, A simple and fast algorithm for k-medoids clustering, Expert Systems with Applications 36 (2009) 3336–3341.
[23]D. Sculley, Web-scale k-means clustering, Proceedings of 19th International Conference on World Wide Web (2010) 1177–1178.
[24]A. Kraskov, H. Stogbauer, R. G. Andrzejak, P. Grassberger, Hierarchical clustering based on mutual information, arXiv:q-bio/0311039v2 [q-bio.QM].
[25]G. J. Szekely, M. L. Rizzo, M. L, Hierarchical clustering via joint between-within distances: Extending Ward’s minimum variance method, Journal of Classification 22 (2005) 151–183.
[26]E. Achtert, C. Bohm, P. Kroger, DeLi-Clu: Boosting robustness, completeness, usability, and efficiency of hierarchical clustering by a closest pair ranking, Lecture Notes in Computer Science 3918 (2006) 119–128.
[27]E. Achtert, C. Bohm, P. Kroger, A. Zimek, Mining hierarchies of correlation clusters, Proceedings of 18th International Conference on Scientific and Statistical Database Management (SSDBM) (2006) 119–128.
[28]W. Zhang, D. Zhao, X. Wang, Agglomerative clustering via maximum incremental path integral, Pattern Recognition 46 (11) (2013) 3056–3065.
[29]M. Gagolewski, M. Bartoszuk, A. Cena, Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm, Information Sciences 363 (2016) 8–23.
[30]A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B 39 (1) (1977) 1–38.
[31]J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algoritms, Plenum Press, New York, 1981.
[32]M. A. T. Figueiredo, A. K. Jain, Unsupervised learning of finite mixture models, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (3) (2002) 381–396.
[33]K. Pal, J. Keller, J. Bezdek, A possibilistic fuzzy c-means clustering algorithm, IEEE transactions of Fuzzy Systems 13 (4) (2005) 517–530.
[34]M. R. Fellows, J. Guo, C. Komusiewicz, R. Niedermeier, J. Uhlmann, Graphbased data clustering with overlaps, Discrete Optimization 8 (1) (2011) 2–17.
[35]A. P´erez-Su´arez, J. F. Martinez-Trinidad, J. A. Carrasco-Ochoa, J. E. Medina- Pagola, OClustR: A new graph-based algorithm for overlapping clustering, Neurocomputing 121 (2013) 234–247.
[36]S. Baadel, F. Thabtah, J. Lu, Multi-cluster overlapping k-means extension algorithm, Proceedings of International Conference on Machine Learning and Computing, 2015.
[37]C. Am´endola, J.-C. Faug`ere, E. Sturmfels, Moment varieties of Gaussian mixtures, Journal of Algebraic Statistics 7 (1) (2016) 14–28.
[38]M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large databases with noise, Proceedings of 2nd ACM International Conference on Knowledge Discovery and Data Mining (1996) 226–231.
[39]A. Hinneburg, d. Heim, An efficient approach to clustering large multimedia databases with noise, Prodeedings of 4th ACM International Conference on Knowledge Discovery and Data Mining (1998) 58–65.
[40]H.-P. Kriegel, P. Kroger, J. Sander, A. Zimek, Density-based clustering, WIREs Data Mining and Knowledge Discovery 1 (3) (2011) 231–240.
[41]R. Agrawal, J. Gehrke, D. Gunopoulos, P. Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of ACM International Conference on Management of Data (1998) 94–105.
[42]C.-H. Cheng, A. W. Fu, Y. Zhang, Entropy-based subspace clustering for mining numerical data, Proceedings of 5th ACM International Conference on Knowledge Discovery and Data Mining (1999) 84–93.
[43]K. Kailing, H.-P. Kriegel, P. Kroger, Density-connected subspace clustering for high-dimensional data, Proceedings of SIAM International Conference on Data Mining (SDM’04) (2004) 246–257.
[44]R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan, Automatic subspace clustering of high dimensional data, Data Mining and Knowledge Discovery 11 (2005) 5–33.
[45]E. Achtert, C. Bohm, H.-P. Kriegel, P. Kroger, I. Muller-Gorman, A. Zimek, Detection and visualization of subspace cluster hierarchies, Lecture Notes in Computer Science 4443 (2007) 152–163.
[46]H.-P. Kriege, P. Kroger, A. Zimek, Subspace clustering, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2 (4) (2012) 351–364.
[47]B. J. Frey, D. Dueck, Clustering by passing messages between data points, Science 315 (5814) (2007) 972–976.
[48]C.-S. Ouyang, W.-J. Lee, S.-J. Lee, A TSK-type neuro-fuzzy network approach to system modeling problems, IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics 35 (4) (2005) 751–767.
[49]X. Huang, Y. Ye, L. Xiong, R. Lau, N. Jiang, S. Wang, Time series k-means: A new k-means type smooth subspace clustering for time series data, Information Sciences 367 (2016) 1–13.
[50]X. Huang, Y. Ye, H. Guo, Y. Cai, H. Zhang, Y. Li, DSKmeans: a new kmeanstype approach to discriminative subspace clustering, Knowledge-Based Systems 70 (2014) 293–300.
[51]A. Asuncion, D. Newman, The UCI machine learning repository.
[52]K-means, https://www.mathworks.com/help/stats/kmeans.html.
[53]Fuzzy c-means, https://www.mathworks.com/help/fuzzy/fcm.html.
[54]Gaussian mixture model, https://en.wikipedia.org/wiki/Mixture model.
[55]Matlab, https://www.mathworks.com/products/matlab.html.
[56]Gmm source code, http://blog.pluskid.org/?p=39.
[57]Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, G. Batista, The UCR time series classification archive.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊