跳到主要內容

臺灣博碩士論文加值系統

(44.200.77.92) 您好!臺灣時間:2024/03/01 10:25
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:蘇俊吉
研究生(外文):Chun-Chi Su
論文名稱:有效率的階層式分群演算法
論文名稱(外文):Efficient Hierarchical Clustering Algorithm
指導教授:潘正祥
指導教授(外文):Jeng-Shyang Pan
學位類別:碩士
校院名稱:國立高雄應用科技大學
系所名稱:電子與資訊工程研究所碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:中文
論文頁數:65
中文關鍵詞:資料分群向量量化快速搜尋演算法階層式分群
外文關鍵詞:clusteringVQvector quantizationfast search algorithmHierarchical ClusteringSingle-LinkComplete-LinkAverage-Link
相關次數:
  • 被引用被引用:9
  • 點閱點閱:2727
  • 評分評分:
  • 下載下載:252
  • 收藏至我的研究室書目清單書目收藏:0
在資料分群中,常使用資料與資料間的屬性或相似性來判斷是否屬於同一群組,而相似性大多使用資料之間的歐式距離來決定,以階層式分群演算法(Hierarchical Cluster,如Single-Link,Complete-Link等)作為資料分群是一種簡單易判別且常用的方法,本文研究的方向是改善階層式分群(Hierarchical Cluster)方法的效率減少所有的資料與群組間繁複的計算,運用在向量量化中快速搜尋碼向量的方法,排除相似度較低的資料,減少資料運算的次數,進而節省分群所花費的時間,達成加速分群的結果。
文中介紹各種分群方法的分類,並對幾種聚合型階層式分群演算法的分群結果加以評估,也說明影像向量量化的原理與探討幾種向量量化中快速搜尋的方法,使用數學不等式的模型加以證明。
提出將資料投影至一組單範正交的向量基底,利用正交與不等式的原理,導出Single-Link、Complete-Link與Average-Link等三種常用的聚合型階層式分群演算法的改善方法,以不同的資料型態、維度及數量與傳統的分群方式加以比較及實驗,在實驗結果中本文的方法與傳統的方式可以得到完全相同的分群,在適合這三種分類的資料型態中,本文提出的方法可以更快速更有效率的得到分群結果。
The attribute or similarity is usually used to judge whether the data sets belong to the same cluster and the Euclidean distance is usually applied to determine the similarity. The hierarchical clustering algorithms such as the Single-Link algorithm and the Complete-Link algorithm are commonly used methods. This thesis applied the efficient codeword search methods and improved those methods for efficient clustering based on the Single-Link, Complete-Link and Average-Link algorithms. An improved inequality is developed from the sub-vector, projection and triangular inequality elimination techniques so as to reduce the clustering time while keep the same clustering results. Experimental results demonstrate the usefulness of the proposed efficient clustering algorithms.
目 錄
第一章 緒論 1
1.1 研究動機 1
1.2 研究目的 2
1.3 主要貢獻 2
1.4 論文架構 3
第二章 資料分群演算法 4
2.1資料分群 4
2.1.1資料分群基本概念 4
2.1.2資料分群基本定義 4
2.2資料分群的種類 5
2.2.1資料分群的問題 5
2.2.2資料分群的種類 8
2.3聚合型階層式分群演算法 11
2.3.1四種典型聚合型分群演算法 12
2.3.2四種典型聚合型分群演算法優缺點評估 14
第三章 向量量化 17
3.1向量量化的介紹 17
3.1.1向量量化的原理 18
3.1.2向量量化編碼距離的計算 20
3.2向量量化快速搜尋演算法之研究 21
3.2.1三角不等式化簡(Triangle Inequality Elimination, TIE)準則 22
3.2.2 Equal-average nearest-neighbor search (ENNS) Algorithm 25
3.2.3 IENNS搜尋演算法 28
3.2.4 Equal-average Equal-variance Nearest-Neighbor Search (EENNS) Algorithm 31
3.2.5 改善EENNS及增加效率的搜尋演算法 33
3.2.6 利用投影與三角不等式的快速搜尋演算法 36
第四章 改良階層式資料分群演算法原理 39
4.1改良階層式資料分群演算法基本架構 39
4.2 Efficient Single-Link Algorithm 41
4.3 Efficient Complete-Link Algorithm 42
4.4 Efficient Average Link Algorithm 43
第五章 研究結果 45
5.1 實驗說明 45
5.2 Single-Link Algorithm實驗及數據 45
5.3 Complete-Link Algorithm實驗及數據 51
5.4 Average Link Algorithm實驗及數據 55
5.5實驗結果 59
第六章 結論與未來展望 60
6.1結論 60
6.2未來展望 60
參考文獻 62


表 目 錄
表2-1四種計算群集間距離的方式 13
表2-2四種演算法求出的距離 14
表5-1兩種Single-Link分群方法平均執行結果 50
表5-2兩種Single-Link不同群組方式分群方法結果 51
表5-3兩種Complete-Link分群方法平均執行結果 54
表5-4兩種Average-Link分群方法平均執行結果 58


圖 目 錄
圖2-1: 易造成分群錯誤的資料群集問題 7
圖2-2: 凝聚型與分裂型的演算法,對資料物件{a, b, c, d, e}分群過程 8
圖2-3: 樹狀結構圖(dendrogram),對資料物件{a, b, c, d, e}分群過程 9
圖2-4: 資料分群的種類 11
圖2-5: 資料群集 , 14
圖2-6: Complete-link傾向找出圓形群集 16
圖3-1: 影像向量量化 18
圖3-2: 影像資料壓縮過程圖示 20
圖3-3: 三角不等式化簡準則ㄧ 23
圖3-4: 三角不等式化簡準則二 24
圖3-5: 三角不等式化簡準則三 25
圖3-6: ENNS 搜尋的過程及範圍 28
圖3-7: IENNS 搜尋的過程及範圍 30
圖3-8: EENNS 搜尋的過程及範圍 33
圖3-9: 向量基底的圖形 37
圖5-1: 5群500 objects Single-Link分群資料 45
圖5-2: 8群1000 objects Single-Link分群資料 46
圖5-3: 6群2000 objects Single-Link分群資料 46
圖5-4: 6群3000 objects Single-Link分群資料 47
圖5-5: 9群4000 objects Single-Link分群資料 47
圖5-6: 4群4000 objects Single-Link分群資料 48
圖5-7: 6群3000 objects Single-Link分群資料 48
圖5-8: 8群4000 objects Single-Link分群資料 49
圖5-9: 8群4000 objects Single-Link分群資料 49
圖5-10: 8群1000 objects Complete-Link分群資料 52
圖5-11: 9群2000 objects Complete-Link分群資料 52
圖5-12: 9群4000 objects Complete-Link分群資料 53
圖5-13: 4群2000 objects Complete-Link分群資料 53
圖5-14: 4群2000 objects Complete-Link分群資料 54
圖5-15: 8群1000 objects Average-Link分群資料 55
圖5-16: 9群2000 objects Average-Link分群資料 56
圖5-17: 8群1000 objects Average-Link分群資料 56
圖5-18: 4群1000 objects Average-Link分群資料 57
圖5-19: 4群1000 objects Average-Link分群資料 57
圖5-20: 4群2000 objects Average-Link分群資料 58
參考文獻
[1] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, New Jersey, 1988.
[2] L. Kaufman and P. J. Roussesuw, Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley and Sons, New York, 1990.
[3] J. A. Hartigan, Clustering Algorithms, John Wiley and sons, Inc., New York, 1975.
[4] M. R. Anderberg, Cluster Analysis for Applications, Academic Press, Inc., New York, 1973.
[5] T. R. Babu and M. N. Murty, “Comparison of genetic algorithm based prototype selection schemes,” Pattern Recognition, vol. 34, pp. 523-525.
[6] A. K. Jain and P. J. Flynn, Image Segmentation Using Clustering, IEEE Press, Piscataway, NJ.
[7] J. M. Jolion, P. Meer and S. Bataouche, “Robust clustering with applications in computer vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, No. 8, pp. 791-802, 1991.
[8] E. Rasumssen, Clustering Algorithms. In Information Retrieval: Data Structures and Algorithms, Prentice-Hall, Inc., Upper Saddle River, 1992.
[9] R. Krishnapuram, A. Joshi and L. Yi, “A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering,” in IEEE International Fuzzy Systems Conference, Seoul, Korea, pp. 1281-1286, 1999.
[10] B. Mobasher, R. Cooley and J. Srivatstava, “Creating adaptive web sites through usage-based clustering of URLs,” in Knowledge and Data Engineering Workshop, 1999.
[11] O. Zamir, O. Etzion, O. Mandani and R. Karp, “Fast and intuitive clustering of web documents,” in Third International Conference on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, California, Newport Beach, CA, USA, pp. 287-290, 1997.
[12] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer, Boston, MA, 1992.
[13] R. Agrawal, J. Gehrke, D. Gunopulos and P. Raghavan, “Automatic subspace clustering of high dimensional data for data mining applications,” in ACM SIGMOD International Conference on the Management of Data, Seattle, pp. 94-105, 1998.
[14] S. Guha, R. Rastogi and K. Shim, “CURE: an efficient clustering algorithm for large databases,” in ACM SIGMOD International Conference on the Management of Data, Seattle, WA, USA, pp. 73-84, 1998.
[15] M. Ester, H. P. Kriegel, J. Sander and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in E. Simoudis, J. Han and U. Fayyad, eds., in Second International Conference on Knowledge Discovery and Data Mining, AAAI Press, Portland, Oregon, pp. 226-231.
[16] O. R. Zaïane and C. H. Lee, “Clustering spatial data when facing physical constraints,” in IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, IEEE Computer Society, pp. 737-740, 2002.
[17] R. M. Gray, “Vector quantization,” IEEE Acoust, Speech, Signal Processing Magazine, pp. 4-29, Apr. 1984.
[18] V. J. Mathews, “Multiplication free vector quantization using L1 distortion measure and its variants,” IEEE Trans. Image Process., vol. 40, pp. 11-17, Tan. 1992.
[19] H. Q. Cao and W. Li, “A fast search algorithm for vector quantization using a directed graph,” IEEE Trans Circuits Syst. Video Technol., vol. 10, pp.585-593, Jun. 2000.
[20] C. D. Bei and R. M. Gray, “An improvement of the minimum distortion encoding algorithm for vector quantization,” IEEE Trans. Commum., vol.COM-33, pp. 1132-1133, Oct. 1985.
[21] T. S. Chen and C. C. Chang, “Diagonal axes method (DAM): A fast search algorithm for vector quantization,” IEEE Trans. on Circuits and System for video Technology, vol. 7, no. 3, pp. 555-559, 1997.
[22] C.-H. Hsieh and Y.-J. Liu, “Fast search algorithms for vector quantization of images using multiple triangle inequalities and wavelet transform,” IEEE Trans. Image Processing, vol. 9, pp. 321-328, Mar. 2000.
[23] S. W. Ra and J. K. Kim, “A fast mean-distance-ordered partial codebook search algorithm for image vector quantization,” IEEE. Trans. on Circuits and Systems-II: Analog and Digital Signal Processing, vol. 40, no. 9,pp.576-579,1993.
[24] L. Guan and M. Kamel, “Equal-average hyperplane partitioning method for vector quantization of image data,” Pattern Recognit. Lett., pp.693–699, Oct. 1992.
[25] J. S. Pan, Z. M. Lu, and S. H. Sun, “Fast codeword search algorithm for image coding based on mean-variance pyramids of codewords,” Electorn. Lett,. vol.36, no. 3, pp. 210-211, Feb. 2000.
[26] B. C. Song and J. B. Ra, “A fast search algorithm for vector quantization using L2-norm pyramid of codewords,” IEEE Trans. Image Processing, vol. 11, pp.10-15, Jan. 2002.
[27] Shu-Chuan Chu, 2004, Improved Clustering and Soft Computing Algorithms, Ph.D thesis.
[28] J. S. Pan and K. C. Huang, “A new vector quantization image coding algorithm based on the extension of the bound for minkowski metric,” Pattern Recognit., vol. 31, no. 11, pp. 1757–1760, 1998.
[29] C. H. Lee and L. H. Chen, “Fast closest codeword search algorithm for vector quantization,” Proc. Inst. Elect. Eng., vol. 141, no. 3, pp.143–148, 1994.
[30] S. J. Baek, B. K. Jeon, and K. M. Sung, “A fast encoding algorithm for vector quantization,” IEEE Signal Processing Lett., vol. 4, pp. 325–327,Feb. 1997.
[31] J. S. Pan, Z. M. Lu, and S. H. Sun, “An efficient encoding algorithm for vector quantization based on subvector technique,” IEEE Trans. Image Processing, vol. 12, pp. 265–270, Mar. 2003.
[32] Jim Z. C. Lai and Yi-Ching Liaw, “Fast-Searching Algorithm for Vector Quantization Using Projection and Triangular Inequality”, IEEE Trans. Image Processing, vol. 13, No. 12, Dec. 2004.
[33] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, New Jersey, 1988..
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top