臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.136) 您好！臺灣時間：2025/09/20 22:57

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

林正芳

研究生(外文):

Cheng-Fang Lin

論文名稱:

以重力理論為基礎的二階段階層式資料分群演算法特性之研究

論文名稱(外文):

A Study on the Characteristics of A Two-Phase Hierarchical Clustering Algorithm Based on Gravity Theory

指導教授:

歐陽彥正

學位類別:

碩士

校院名稱:

國立臺灣大學

系所名稱:

資訊工程學研究所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2002

畢業學年度:

語文別:

中文

論文頁數:

中文關鍵詞:

階層式演算法、資料分群、重力理論

外文關鍵詞:

clustering algorithm、hierarchical clustering、gravity based clustering

相關次數:

被引用:8
點閱:217
評分:
下載:28
書目收藏:1

本論文介紹一個以重力理論為基礎的二階段階層式資料分群演算法，二階段的設計是為了要提高分群的效果。第一階段的目的為，在資料中找出形狀為圓形的群集。在這一階段裡，將每筆資料視為一個質點，模擬在K維空間中n個質點受到彼此間重力吸引作用而移動、碰撞的過程，而自動將資料分為數個圓形群集。在第二階段裡，我們模擬圓形群集，受到重力影響，在樹狀圖中合併的過程。
本論文研究的演算法，針對分佈形狀不規則的群集亦能有效的分群，並以數值型資料檢驗其分群效果。本論文採用三組二維節點的資料集，檢驗演算法中參數對於分群結果的影響，並以CHAMELEON所產生的資料集與Oyang提出的重力演算法做比較，發現本文演算法的確在不規則形狀群集的分群上，得到更好的分群效果。同時本文演算法亦可避免鏈結效應問題及分群傾向圓球形群集的問題。
本論文所提出演算法之時間複雜度，在資料數目n大量增加時，仍能維持在O(n2)，而與著名的幾種階層式分群演算法相比，除了Single-link外，本論文之分群演算法的時間複雜度和空間複雜度均比較低。

This thesis reports a study on a two-phase hierarchical data-clustering algorithm that operates based on gravity theory in physics. The design of the two-phase algorithm is aimed at achieving high clustering quality. The main purpose of the first phase of the algorithm is to identify those subclusters that are of spherical shapes. In this phase, each data instance is regarded as an object in a multidimensional vector space and a simulation is conducted to figure out how these objects influence each other and merge to form subclusters due to the gravity force. Then, those subclusters with spherical shapes are identified and an abstract model is employed to record the essential information of each spherical subcluster. In the second phase of the algorithm, a simulation is conducted to figure out how the spherical subclusters merge to form clusters at higher levels of the dendrogram due to the gravity force. In the study reported in this thesis, comprehensive experiments were conducted to figure out the optimal ranges of parameter settings and how the two-phase clustering algorithm compares with the existing hierarchical clustering algorithms with respect to clustering quality. Experimental results reveal that, if the parameters are properly set, then the two-phase clustering algorithm generally is more powerful in identifying clusters with arbitrary shapes than the existing hierarchical clustering algorithms.

第一章緒論 1
1.1 資料分群的興起 1
1.2 研究動機及目的 2
1.3 論文架構 2
第二章資料分群演算法之研究 3
2.1 定義 3
2.1.1屬性 3
2.1.2相似度 4
2.2 資料分群演算法的四種類型 4
2.2.1分組式資料分群演算法（partition clustering） 5
2.2.2階層式資料分群演算法（hierarchical clustering） 5
2.2.3密度基礎式分群演算法（density-based clustering） 6
2.2.4 網格式資料分群演算法（grid-based clustering） 7
2.3 聚合型階層式資料分群演算法 8
2.3.1傳統的聚合型階層式分群演算法 8
2.3.2 CURE 10
2.3.3 CHAMELEON 10
2.3.4 BIRCH 11
2.3.5 重力分群演算法 12
2.4 階層式分群演算法的問題 12
2.5 複雜度 14
第三章二階段的重力分群演算法 16
3.1 模擬系統 16
3.2 物理屬性及定律 17
3.3 演算法的介紹 19
3.3.1 第一階段（Phase One） 20
3.3.2 找出圓形群集、消除雜訊 22
3.3.3 第二階段（Phase Two） 25
3.4 演算法流程圖 31
3.5 複雜度分析 32
第四章實驗結果 34
4.1 實驗一 34
4.1.1 資料集 34
4.1.2 實驗結果 35
4.2 實驗二 36
4.2.1 資料集 37
4.2.2 實驗結果 37
4.3 實驗三 38
4.3.1 資料集 38
4.3.2 實驗結果 39
4.4 實驗四 41
4.4.1 資料集 41
4.4.2 實驗結果 41
4.5 實驗五 43
4.5.1 資料集 43
4.5.2 實驗結果 43
4.6 實驗六 45
4.6.1 資料集 45
4.6.2 實驗結果 45
4.7 實驗七 46
4.7.1 資料集 46
4.7.2 實驗結果 46
4.8 實驗八 47
4.8.1 資料集 47
4.8.2 實驗結果 49
4.9 實驗九 51
第五章結論與未來展望 53
5.1 結論 53
5.2 未來展望 54
參考文獻 55

[1] A.K. Jain, M.N. Murty and P. J. Flynn. Data Clustering: A Review. ACM Computing Surveys, vol. 31, no. 3, pp.264-323, 1999.
[2] W.E. Wright. A formalization of cluster analysis and gravitational clustering. Doctoral Dissertation. Washington University, 1972.
[3] Yen-Jen Oyang, Chien-Yu Chen, and Tsui-Wei Yang. A Study on the Hierarchical Data Clustering Algorithm Based on Gravity Theory, Proceedings of 5th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 350-361, 2001.
[4] Yen-Jen Oyang, Chien-Yu Chen, Shien-Ching Huang, and Cheng-Fang Lin. Characteristics of a Hierarchical Data Clustering Algorithm Based on Gravity Theory, Technical Report of NTUCSIE 02-01.
(Available at http://mars.csie.ntu.edu.tw/~cychen/publications_on_dm.htm)
[5] Chien-Yu Chen, Shien-Ching Huang, and Yen-Jen Oyang. An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory, Proceedings of 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp.237-250, 2002.
[6] G. Karypis, E.-H. Han, V. Kumar. CHAMELEON: A hierarchical Clustering Algorithm Using Dynamic Modeling. COMPUTER, Vol. 32, pp.68-75, 1999.
[7] S. Guha, R. Rastogi and K. Shim. Cure: An efficient clustering algorithm for large databases. In Proc. 1999 ACM-SIGMOD Int Conf. Management of Data (SIGMOD’98), pp. 73-84, Seattle, WA, 1998.
[8] T. Zhang, R. Ramakrishnan, M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. In Proc. of the 1996 ACM-SIGMOD International Conference on Management of (SOGMOD-96) , pp. 103-114 , Jun. 1996
[9] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, 2000.
[10] M. Ester, H-P. Kriegel, J. Sander, X. Xu. Clustering for Mining in Large Spatial Databases. Special Issue on Data Mining, KI-Journal, ScienTec Publishing, Vol. 1, pp. 18-24, 1998.
[11] D. Eppstein. Fast hierarchical clustering and other applications of dynamic closet pairs. The ACM Journal of Experimental Algorithms, vol. 5, no.1, pp.1-23, 2000.
[12] J.Orear. Physics. Macmillan Publishing Co., Inc., New York, 1979
[13] I.H. Witten, E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann Publishers, San Francisco, 2000.
[14] R.C. Dubes. How many clusters are best? --- an experiment. Pattern Recognition, vol. 20, no. 6, pp. 645-663, 1987.
[15] R.V. Hogg and E.A. Tanis, Probability and statistical inference, Prentice Hall, New Jersey, 2001.
[16] Kurita, T., An efficient agglomerative clustering algorithm using a heap, Pattern Recognition, Vol. 24, No.3, pp. 205-209, 1991.
[17] Richard O. Duda, Peter E. Hart and David G. Stork. Pattern Classification, Wiley-Interscience Publication, 2001.
[18] R. Ng, J. Han. "Efficient and Effective Clustering Methods for Spatial Data Mining". Proceeding of the 20''" VLDB Conference, Santiago, Chile, 1994.
[19] P.S. Bradley and U.M. Fayyad. Refining initial points for K-Means clustering. In Proc. 15th Intl. Conf. on Machine Learning, pages 91--99, 1998
[20] P.S. Bradley and U.M. Fayyad and C. Reina. Scaling clustering algorithms to large databases. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining(KDD’98), pages 9-15, New York, Aug. 1998.
[21] L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley & Sons, 1990.
[22] J. H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58:236--244, 1963.
[23] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining(KDD’96), pages 226-231, Portland, OR, Aug. 1996
[24] M. Ankerst, M. Breunig, H.-P. Kriegel, and J.Sander. OPTICS: Ordering points to identify the clustering structure. In Proc. 1999 ACM-SIGMOD Int Conf. Management of Data (SIGMOD’99), pages 49-60, Philadelphia, PA, June 1999.
[25] A. Hinneburg and D. A. Keim. An efficient approach to clustering in large multimedia databases with noises. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining(KDD’98), pages 58-65, New York, Aug. 1998
[26] W. Wang, J. Yang, and R. Muntz. STING: A statistical information grid approach to spatial data mining. In Proceedings of the 23rd VLDB Conference, pages 186--195, Athens, Greece, 1997.
[27] C. Sheikholeslami, S. Chatterjee, A. Zhang. WaveCluster: A- MultiResolution Clustering Alproach for Very Large Spatial Database. In Proc. 1998 Conf. VLDB, New York, USA, 1998.
[28] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD Conference on Management of Data, Seattle, June 1998.
[29] M. Stonebraker, J. Frew, K. Gardels and J. Meredith. The Sequoia 2000 Storage Benchmark. Proceedings of SIGMOD, pp. 2 — 11, 1993

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	構件整合環境下之次組裝識別模型
2.	混合分配的分群方法比較研究：二階段集群法、潛在類別模式及自組織映射圖的模擬與實證分析
3.	智慧型研究資料階層分類搜尋系統之建置
4.	使用N組連結平均法的階層式自動分群
5.	混合「引力行為與隨機交換」演算法處理類別性資料分群之研究
6.	可修正的二分群集法

1.	王以仁（1998）。時代社會變遷中婚姻與家庭問題之探究，家庭教育雙月刊，1，3-10。
2.	呂玉瑕（1980）。社會變遷中台灣婦女之事業觀：婦女角色意識與就業態度之探討。中央研究院民族學研究所集刊，50，25-65。
3.	呂玉瑕（1994）。城鄉經濟發展與已婚婦女就業──女性邊緣化理論（Female Marginalization）試探。人口學刊，16，107-133。
4.	呂玉瑕（1995）。社會學與性別研究。近代中國婦女史研究，3，177-192。
5.	利翠珊（1999）。已婚女性家庭系統的交會：親情與角色的兩難。中華心理衛生學刊，3，1-26。
6.	李奉儒（1998）。台灣地居家庭倫理的變遷：從傳統到後現代的省思，理論與政策，48，177-189。
7.	成令方（1999）。男性的女性主義者在台灣。當代，142，81-84。
8.	吳明燁（1998）母親就業對於父母角色分工的影響─以育有青少年子女的家庭為例，中大社會文化學報，6，113-139。
9.	林鶴玲、李香潔（1999）。台灣閩、客、外省族群家庭中之性別資源配置。載於人文及社會科學集刊，11（4），475-528。
10.	高淑貴（1997）。漁村婦女在家庭經營決策參與之研究。農業推廣學報，13，19-53。
11.	黃朗文（1996）。已婚兩性對家務分工意識型態之研究。東吳社會學報，7，81-111。
12.	傅佩榮（1994）。家庭觀念的演變與調適，教師天地，72，10-21。
13.	簡春安（1994），變遷社會中台灣地區的婚姻與家庭，研考雙月刊，31-45。
14.	謝小芩（1999）。男性如何可能成為女性主義思考主體，當代，142，88-89。

1.	球型群組為基礎之漸入階層式資料分群演算法特性之研究
2.	以影響函數為基礎之階層式分群演算法
3.	特殊領域文件分群之系統設計與研究─以佛學資料為例
4.	以國畫手法自動描繪三維動物模型
5.	以密度為基準，壓縮後資料之離均點偵測方法
6.	網路構建與路由演算法於MANET中的效能比較
7.	服務業e化成功策略之探討--以餐旅業為例
8.	後PC時代電子產品開發方法之研究
9.	協同合作式社會網絡篩選：結合網頁結構探勘與協同合作式篩選的個人化機制
10.	電子商業網站產品推薦系統的研究與實作
11.	以LDAP為後端之DNS設計與實作
12.	低功率元件設計計數器與除法器
13.	以程式碼縮減為基礎的記憶體最佳化
14.	認證語音完整性之數位浮水印技術
15.	以服務品質收費網路之重新繞送演算法

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室