(44.192.70.216) 您好!臺灣時間:2021/05/09 19:22
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:Maisyatus Suadaa Irfana
研究生(外文):Maisyatus Suadaa Irfana
論文名稱:在完整必連限制條件下的資料分群
論文名稱(外文):Data Clustering with Complete Must-Link Constraints
指導教授:楊朝龍博士
指導教授(外文):Chao-Lung Yang
口試委員:楊朝龍博士
口試委員(外文):Chao-Lung Yang
口試日期:2013-12-18
學位類別:碩士
校院名稱:國立臺灣科技大學
系所名稱:工業管理系
學門:商業及管理學門
學類:其他商業及管理學類
論文種類:學術論文
論文出版年:2014
畢業學年度:102
語文別:英文
論文頁數:67
中文關鍵詞:限制分群法完整必連限制主成份分析資料重疊
外文關鍵詞:Constrained ClusteringComplete Must LinkPrincipal Component AnalysisData Overlapping
相關次數:
  • 被引用被引用:0
  • 點閱點閱:123
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:22
  • 收藏至我的研究室書目清單書目收藏:0
本研究的目標在發展一個整合型的資料分群方法來解決一種常見的限制分群問題(constrained clustering problem), 完整必連(Complete Must-Link, CML)限制問題。限制分群法屬於半監督學習方法(semi-supervised learning),它著重於當資料之間的群組關係為已知時,如何根據其形成的限制來增進分群分析的效率及正確性。完整必連(Complete Must-Link, CML)限制指得是當所有的資料均事先隸屬於一個特定次群集,此一CML限制會強制資料分群時,需將相同的CML群集歸於同群。當使用傳統的限制分群法來處理CML限制問題時,因CML次群資料會被其質量中心點所取代而導致資訊遺失的問題。此一問題在次群集資料相互重疊時更顯得嚴重。本研究提出一個新的分群方法來改善限制分群方法。首先,我們提出一個衡量CML次群集重疊程度的方法(overlapped ratio)。當CML次群集重疊程度高時,考慮在相似矩陣中加入能代表各個資料群集特性的主成份分數。並以此相似矩陣,運用傳統的分群方法重新找出分群結果。此研究以電腦模擬及實際收集的數據將被用來分析新分群法的效能。實驗結果證實所提出的方法能有效解決CML限制資料分群時的資料遺失問題並取得較佳的分群結果。
This research aims to develop an integrated method to solve a special but not uncommon constrained clustering problem constructed by Complete Must-Link (CML) constraints. Constrained clustering analysis is a semi-supervised learning to accommodate the information while it is available, to improve efficiency and purity of clustering. CML clustering problem can be considered as aggregating pre-defined data groups. Through the transitive closure process of data aggregation, the data of each group is replaced by their centroid for clustering analysis. This causes information missing issue which means the data distribution or shape of original group is omitted, especially when the group is overlapped with each other. In this research, in order to overcome this problem, a new method named PCA-CML is proposed for CML constrained clustering problem. The principal component analysis (PCA) which provides the supplemental information describing original partition blocks is suggested to be included in the distance matrix of the constrained clustering algorithm if they are overlapped each other. Overlapped ratio is invented to determine whether CML data partitions are overlapped or not. We test the proposed algorithm using the simulated dataset that consists of overlapped and non-overlapped dataset, and real-world dataset containing cartridge quality information. From the experimental result, we can conclude that the proposed algorithm can alleviate missing information issue in CML constrained clustering when pre-defined CML partitions are overlapped.
Table 1. Comparison of clustering algorithms 12
Table 2. Covariance Matrix 3 Dimensional 17
Table 3. Notation of Overlapped Ratio 23
Table 4. Experiment Result Non-overlapped dataset 36
Table 5. Experiment Result Overlapped dataset 38
Table 6. Experiment Result Cartridge dataset 42
Table 7. Experimental Result of Yeast dataset 44
Table 8. CML Information: 45
Table 9. Experimental Result of Auto MPG dataset 46
[1]"Clustering," in Encyclopedia of Machine Learning, C. Sammut and G. Webb, Eds., ed: Springer US, 2010, pp. 180-180.
[2]Z. Ghahramani, "Unsupervised Learning," in Advanced Lectures on Machine Learning. vol. 3176, O. Bousquet, U. Luxburg, and G. Ratsch, Eds., ed: Springer Berlin Heidelberg, 2004, pp. 72-112.
[3]A. Jain, "Data Clustering: 50 Years Beyond K-means," in Machine Learning and Knowledge Discovery in Databases. vol. 5211, W. Daelemans, B. Goethals, and K. Morik, Eds., ed: Springer Berlin Heidelberg, 2008, pp. 3-4.
[4]I. D. S. Basu, K. Wagstaff, Clustering with Constraints: Algorithms, Applications: Chapman & Hall/CRC Press, 2008.
[5]H. Jiang, Z. Ren, J. Xuan, and X. Wu, "Extracting elite pairwise constraints for clustering," Neurocomputing, vol. 99, pp. 124-133, 1/1/ 2013.
[6]Z. Shaohong and W. Hau-San, "Active constrained clustering with multiple cluster representatives," in Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on, 2009, pp. 2689-2694.
[7]K. Wagstaff, C. Cardie, S. Rogers, S. Schr, #246, and dl, "Constrained K-means Clustering with Background Knowledge," presented at the Proceedings of the Eighteenth International Conference on Machine Learning, 2001.
[8]K. Wagstaff and C. Cardie, "Clustering with Instance-level Constraints," presented at the Proceedings of the Seventeenth International Conference on Machine Learning, 2000.
[9]X. Xiao-Hua, P. Zhou-Jin, H. Ping, and C. Ling, "Constrained ant clustering," in Machine Learning and Cybernetics (ICMLC), 2011 International Conference on, 2011, pp. 1566-1570.
[10]M. N. M. A.K. JAIN, P.J. FLYNN, "Data Clustering: A Review," ACM Computing Surveys, 31(3). 2000.
[11]M. C. Law, A. Topchy, and A. Jain, "Clustering with Soft and Group Constraints," in Structural, Syntactic, and Statistical Pattern Recognition. vol. 3138, A. Fred, T. Caelli, R. W. Duin, A. Campilho, and D. de Ridder, Eds., ed: Springer Berlin Heidelberg, 2004, pp. 662-670.
[12]I. Davidson and S. S. Ravi, "Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical Results," in Knowledge Discovery in Databases: PKDD 2005. vol. 3721, A. Jorge, L. Torgo, P. Brazdil, R. Camacho, and J. Gama, Eds., ed: Springer Berlin Heidelberg, 2005, pp. 59-70.
[13]A. Y. N. E. P. Xing, M. I. Jordan, and S. Russell, "Distance metric learning, with application to clustering with side-information," presented at the in Advances in Neural Information Processing Systems 15, 2002.
[14]Z. Li and L. Tao, "Semi-supervised Hierarchical Clustering," in Data Mining (ICDM), 2011 IEEE 11th International Conference on, 2011, pp. 982-991.
[15]J. B. MacQueen, "Some Methods for Classification and Analysis of MultiVariate Observations," presented at the Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967.
[16]J. Wu, "Cluster Analysis and K-means Clustering: An Introduction," in Advances in K-means Clustering, ed: Springer Berlin Heidelberg, 2012, pp. 1-16.
[17]D. Fisher, "Knowledge Acquisition Via Incremental Conceptual Clustering," Machine Learning, vol. 2, pp. 139-172, 1987/09/01 1987.
[18]H. Yi and K. Sam, "Learning Assignment Order of Instances for the Constrained K-Means Clustering Algorithm," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 39, pp. 568-574, 2009.
[19]C.-L. Yang, "Adaptive Clustering Model for Improving Color Consistency," Doctor of Philosophy, Purdue University, 2009.
[20]S. S. R. Ian Davidson, "Clustering With Constraints: Feasibility Issues and the k-Means Algorithm," Fifth SIAM International Conference on Data Mining, H. Kargupta, J. Srivastava, C. Kamath, and A. Goodman, SIAM, Newport, CA, 138-149., 2005.
[21]Y. Zhao, G. Karypis, and U. Fayyad, "Hierarchical Clustering Algorithms for Document Datasets," Data Mining and Knowledge Discovery, vol. 10, pp. 141-168, 2005/03/01 2005.
[22]R. A. Johnson and D. W. Wichern, Applied multivariate statistical analysis: Prentice-Hall, Inc., 2001.
[23]T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin, "The elements of statistical learning: data mining, inference and prediction," The Mathematical Intelligencer, vol. 27, pp. 83-85, 2005/06/01 2005.
[24]C. Combes and J. Azema, "Clustering using principal component analysis applied to autonomy–disability of elderly people," Decision Support Systems, vol. 55, pp. 578-586, 5// 2013.
[25]H. Gao, W. Hong, J. Cui, and Y. Xu, "Optimization of Principal Component Analysis in Feature Extraction," in Mechatronics and Automation, 2007. ICMA 2007. International Conference on, 2007, pp. 3128-3132.
[26]M. Palmer. (2013). Available: http://ordination.okstate.edu/PCA.htm
[27]N. Gaitani, C. Lehmann, M. Santamouris, G. Mihalakakou, and P. Patargias, "Using principal component and cluster analysis in the heating evaluation of the school building sector," Applied Energy, vol. 87, pp. 2079-2086, 6// 2010.
[28]L. I. Smith. (2002). A tutorial on Principal Components Analysis. Available: http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
[29]N. K. a. K. T. Visalakshi, "Impact of Nomalization in Distributed K-Means Clusteing," International Journal of Soft Computing, pp. 4: 168-172, 2009.
[30]J. M. Quick. (2013). R Tutorial Series: Centering Variables and Generating Z-Scores with the Scale() Function.
[31]R. D. Team. (2013). Documentation for package ‘base’ version 2.15.3. Available: http://stat.ethz.ch/R-manual/R-patched/library/base/html/scale.html
[32]K. Singhs. (2008, 8 DEC 2013). Eigenvalues and Eigenvectors. Available: http://mathsforall.co.uk/userfiles/section%207a%2031st%20march%2008.doc
[33]R. Andreani. (2013, 8 DEC 2013). Eigenvalues and Eigenvectors. Available: http://www.ime.unicamp.br/~andreani/matrizes/capitulo7.pdf
[34]Applied multivariate statistical analysis: Prentice-Hall, Inc., 1988.
[35]J. Xue, C. Lee, S. G. Wakeham, and R. A. Armstrong, "Using principal components analysis (PCA) with cluster analysis to study the organic geochemistry of sinking particles in the ocean," Organic Geochemistry, vol. 42, pp. 356-367, 5// 2011.
[36]P. D. John R. Huguenard. Error Sum of Squares (SSE). Available: http://hlab.stanford.edu/
[37]P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, (First Edition): Addison-Wesley Longman Publishing Co., Inc., 2005.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔