跳到主要內容

臺灣博碩士論文加值系統

(44.222.104.206) 您好!臺灣時間:2024/05/25 22:58
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:詹勝凱
研究生(外文):Shen-kai Jan
論文名稱:以特徵維度與群集碰撞偵測為基礎之群集有效性指標的研究與實作
論文名稱(外文):Design and implementation of a feature dimension-aware validity index with cluster collision detection
指導教授:業建華
指導教授(外文):Jian-hua Yeh
口試委員:陳倩瑜黃信貿
口試日期:2015-07-17
學位類別:碩士
校院名稱:真理大學
系所名稱:資訊工程學系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:中文
論文頁數:62
中文關鍵詞:K-Means 分群分監督式學習群內密度群間關係
相關次數:
  • 被引用被引用:0
  • 點閱點閱:106
  • 評分評分:
  • 下載下載:4
  • 收藏至我的研究室書目清單書目收藏:0
本論文將重點置於改善先前Yeh與Joung所設計的CDV群集有效性指標(Cluster Discreteness Validity Index),傳統有效性群集指標的討論大多是群內關係與群間關係,然而先前所設計的CDV群集有效性指標則是以群集分割不足與未合理的懲罰函式所組成,因此,了解到傳統有效性群集指標與CDV群集有效性指標的問題後,本篇論文所提出新的群集有效性指標是以分割不足與過度分割做為設計的主軸。由於CDV群集有效性指標缺乏合理的處理群集過度分割現象,所以我們重新設計出群集過度分割時的所應加入的懲罰分數並將分割不足與過度分割以更合理的組成方式表示,以期能夠達到平衡處理群集分割不足與過度分割的現象,藉由新版本群集分布的有效性計算,做更適切的評估,以發展出最佳的有效性指標為目標。本論文提出名為CDV2’的群集有效性指標,此指標能夠為資料集的分群結果提供品質的測試。為了在實驗中呈現CDV2’有效性指標的高效能與優勢,我們也引用了幾個有效的傳統分群指標作為實驗對照組,包含DI、DBI、ADI、SI,和近幾年內幾乎最有效的PBM,以及我們先前研究結果的CDV。實驗的資料集也經過仔細的挑選來證明CDV2’有效性指標的泛用性,其中包含了五種真實資料集和五種人工資料集用來模擬真實世界的資料分布。所有的資料集都經過實驗測試以展現CDV2’有效性指標的優越特性。
1. 緒論 1
1.1. 前言 1
1.2. 研究動機 4
2. 國內外相關研究 6
2.1. 資料分群 6
2.2. 群集有效性指標 8
2.2.1. Davies-Bouldin Index (DBI) 10
2.2.2. Dunn Index (DI) 11
2.2.3. Alternative Dunn Index (ADI) 11
2.2.4. Silhouette Index 12
2.2.5. PBM Index 13
3. 研究方法 14
3.1. CDV指標的定義 14
3.2. CDV設計特點 18
3.3. 更趨理想的群集有效性指標設計 19
3.4. 新有效群集指標的定義 24
4. 實驗流程 26
4.1. 資料集介紹 26
4.2. 實驗結果 31
5. 結論 43
參考文獻 45
表附錄 48

[1]David A. Freedman, Statistical Models: Theory and Practice, Cambridge University Press, 2005.
[2]Meilă, "Marina, Comparing Clusterings by the Variation of Information". Learning Theory and Kernel Machines: 173–187, 2003.
[3]Tutorial 12, Decision Trees Interactive Tutorial and Resources, online, http://www.cs.cmu.edu/~schneide/tut5/node42.html
[4]Pearson, K., "On Lines and Planes of Closest Fit to Systems of Points in Space", Philosophical Magazine 2 (6): 559–572, 1901.
[5]McLachlan, "Discriminant Analysis and Statistical Pattern Recognition", In: Wiley Interscience, 2004.
[6]Peter J. Rousseeuw (1987). "Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis". Computational and Applied Mathematics 20: 53–65.
[7]J. H. Yeh, F. J. Joung, CDV Index: A Validity Index for Better Clustering Quality Measurement, in: Proceedings of 2014 Conference on Artificial Intelligence and Data Mining, Suzhou, China, March 2014
[8]Gorsuch, R. L., Factor Analysis. Hillsdale, NJ: Lawrence Erlbaum, 1983.
[9]UCI機器學習資料庫, https://archive.ics.uci.edu/ml/index.html.
[10]Borg, I., Groenen, P., "Modern Multidimensional Scaling: theory and applications" (2nd ed.). New York: Springer-Verlag. pp. 207–212, 2005.
[11]J. B. Tenenbaum, V. de Silva and J. C. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction", Science 290 (5500): 2319-2323, 22 December 2000.
[12]Sam Roweis & Lawrence Saul, "Nonlinear dimensionality reduction by locally linear embedding", Science, v.290 no.5500, Dec 22, 2000. pp.2323--2326.
[13]Thomas Landauer, P. W. Foltz, and D. Laham, "Introduction to Latent Semantic Analysis", Discourse Processes 25: 259–284, 1998.
[14]T. Hofmann, "Unsupervised learning by probabilistic latent semantic analysis", Machine Learning, vol. 42, no. 1, pp. 177–196, 2001.
[15]D. M. Blei, A. Y. Ng and M. I. Jordan, "Latent Dirichlet allocation" Journal of Machine Learning Research, vol. 3, no.5, pp. 993-1022, 2003
[16]MacQueen, J. B., "Some Methods for classification and Analysis of Multivariate Observations". Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press. pp. 281–297, 2967.
[17]Jian-hua Yeh, Chen Lin, Yuan-ling Chang, "Classification Improvement Based on Feature Combination and Topic Vector Model". In Proceedings of the 2012 International Conference on Systems and Informatics(ICSAI2012), Yantai, China, May 2012.
[18]Davies, D. L.; Bouldin, D. W., "A Cluster Separation Measure". IEEE Transactions on Pattern Analysis and Machine Intelligence (2): 224, 1979.
[19]Dunn, J. C.,"A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters". Journal of Cybernetics 3 (3): 32–57, 1973.
[20]Imran Shafi,"Validity-Guided Fuzzy Clustering Evaluation for NeuralNetwork-Based Time-Frequency Reassignment", EURASIP Journal on Advances in Signal Processing, 2010.
[21]Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. An Introduction to Information Retrieval, Cambridge University Press, 2009, p. 233.
[22]Kirkpatrick, S.; Gelatt, C. D.; Vecchi, M. P., Optimization by Simulated Annealing, Science 220 (4598): 671–680, 1983.
[23]UCI machine learning repository, online, http://archive.ics.uci.edu/ml/
[24]Abalone data set, in UCI machine learning repository, online, http://archive.ics.uci.edu/ml/datasets/Abalone
[25]Image Segmentation data set, in UCI machine learning repository, online, http://archive.ics.uci.edu/ml/datasets/Image+Segmentation
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top