跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.169) 您好!臺灣時間:2024/12/11 15:11
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:戴品聰
研究生(外文):Piin-Tsong Dai
論文名稱:一個以密度為基礎的高效率組群化演算法
論文名稱(外文):An Efficient Density-based Clustering Algorithm
指導教授:楊東麟楊東麟引用關係
指導教授(外文):Don-Lin Yang
學位類別:碩士
校院名稱:逢甲大學
系所名稱:資訊工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:英文
論文頁數:57
中文關鍵詞:密度式組群參數資料挖掘組群化演算法
外文關鍵詞:Clustering AlgorithmData MiningDensity-BasedCluster Parameters
相關次數:
  • 被引用被引用:0
  • 點閱點閱:199
  • 評分評分:
  • 下載下載:12
  • 收藏至我的研究室書目清單書目收藏:3
近年來,組群化演算法已經被視為在資料挖掘領域中一個強大的工具。組群化演算法已經有效率地被延伸應用在大型資料集以及其他各種不同特定領域的應用上。這些不同的應用包含了影像壓縮、市場區隔、空間資料發掘以及統計資料的分析等。
在這篇論文中,我們提出一個新的基植於密度式組群化的Density-Based Clustering using Statistical Partition (DBCSP)演算法來加速尋找最終組群之收斂形心。我們研究的主要目的在於了解組群化演算法以及分析組群特性來決定組群化演算法之參數。在以往的研究中,有各式各樣處理大型多維度之資料集的組群化演算法被提出。然而,幾乎所有的組群化演算法都先要求給定參數,但是使用者卻沒有足夠的知識背景來決定所要求之參數。因此,我們的研究將分析與探討組群資料分佈的意義以及參數對於組群的意義。我們提出一個組群評估式以有效地找出適當的組群參數,以此來幫助使用者省略組群參數的設定。
在某些組群化應用上,如目標市場研究,可能會需要指明特定的組群個數。基於這類的使用者需求,我們也提出了一個基植於k-means演算法,但卻更有效率地能產生更佳執行結果的演算法,以減少距離計算和處理時間的成本。
In the recent years, many clustering algorithms have been recognized as powerful tools for Data Mining. Most clustering algorithms were extended to work efficiently on large datasets and other various domains of the particular applications. It is used in many diversified applications such as image compression, market segmentation, spatial data discovery and statistical data analysis.
In this thesis, we propose a novel algorithm called Density-Based Clustering using Statistical Partition (DBCSP) to speedup the time in finding final converged centroids based on density-based clustering method. The goal of our research is to study the clustering algorithms and to analyze the characteristics of clusters for setting the parameters of clustering algorithms. In the past research results, many clustering algorithms are able to deal with high dimensional datasets. While almost all of the clustering algorithms require input parameters, most users do not have enough domain knowledge to determine these parameters. Thus, our research focuses on analyzing and finding the meaning of data distributions in clusters and the relation of parameters to their respective clusters. We propose a formula of evaluating the important factors in a cluster to determine whether a dataset is well clustered. It relieves users from parameter settings.
In some clustering applications, such as target market research, a specified number of clusters need to be given. Thus, we also propose a more efficient algorithm based on the k-means method that can produce the same or comparable clustering results with much better performance. The total cost of distance calculation and processing time can be reduced.
摘要i
Table of Contentsiii
List of Figuresv
List of Tablesvi
Chapter 1 Introduction1
1.1 Motivation1
1.2 Data Mining2
1.2.1 Classification2
1.2.2 Cluster Analysis4
1.2.3 Association Rule9
1.2.4 Sequential Pattern11
1.3 Summary11
1.4 Organizations of the Thesis12
Chapter 2 Related Work13
2.1 Density Clustering13
2.1.1 DBSCAN Method13
2.1.2 OPTICS Method16
2.1.3 DBCLASD Method18
2.1.4 DENCLUE Method19
2.1.5 K-means Method21
Chapter 3 Proposed Methods24
3.1 Overview24
3.2 The Proposed Density-Based clustering Methods26
3.2.1 Data Preprocessing26
3.2.2 Partitioning the Dataset into Unit Blocks29
3.2.3 Density-Based Clustering31
Chapter 4 Experiments and Results37
4.1 Dataset Description37
4.2 Experimental Results39
Chapter 5 Conclusion and Future Work44
5.1 Conclusion44
5.2 Future Work44
References46
Acknowledgments48
Vita49
[1]R. T. Ng and J. Han, “Efficient and Effective Cluster in Methods for Spatial Data Mining,” Proc. 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 144-155.
[2]T. Zhang, R. Ramakrishnan, and M. Linvy, “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” Proc. ACM SIGMOD Int. Conf. on Management of Data, ACM Press, New York, 1996, pp.103-114
[3]S. Guha, R. Rastogi, and K. Shim, “CURE: An Efficient Clustering Algorithms for Large Databases,” Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 73-84
[4]M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density- Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.
[5]K. Alsabti, S. Ranka, and V. Singh, “An Efficient K-Means Clustering Algorithm,” PPS/SPDP Workshop on High performance Data Mining, 1997.
[6]E. Schikuta, “Grid Clustering: An efficient Hierarchical Clustering Method for Very Large Data Sets,” Proc. 13th Int’l Conference on Pattern Recognition, 2, 1996.
[7]Zhexue Huang, “A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining,” SIGMOD''97 Data Mining Workshop.
[8]Zhexue Huang, “Clustering Large Data Sets With Mixed Numeric and Categorical Values,” In Proceedings of The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, World Scientific, 1997.
[9]S. Guha, R. Rastogi, and K. Shim, “ROCK: A Robust Clustering Algorithm for Categorical Attributes,” Proceedings of the IEEE Conference on Data Engineering, 1999.
[10]D. L. Yang, J. H. Chang, M. C. Hong, and J. S. Liu, “An Efficient K-Means-Based Clustering Algorithm,” Proc. of The First Asia-Pacific Conference on Intelligent Agent Technology, pp. 269-273, Dec. 1999, Hong Kong.
[11]R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rules in Large Databases: In Proc. 1994 Int''l Conf. VLDB, pp. 487-499, Santiago, Chile, Sep. 1994.
[12]R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. 1993 ACM Special Interest Group on Management of Data, pp. 207—216, 1993.
[13]R. Agrawal and R. Srikant, “Mining sequential patterns,” In Proc. Eleventh International Conf. on Data Engineering, IEEE Computer Society Press, pp. 3-14, 1995.
[14]P. S. Bradly and U. Fayyad, “Refining Initial Points for K-means Clustering,” Proc. 15th International Conf. Machine Learning. Morgan Kaufmann 1998.
[15]B. Everitt, “Cluster Analysis,” Heinemann Educational Books Ltd. (1974).
[16]K. Hattori and Y. Torii, “Effective algorithms for the nearest neighbor method in the clustering problem,” Pattern Recognition, 1993, Vol. 26, No. 5, pp. 741-746.
[17]A. K. Jain and R. C. Dubes, “Algorithms for Clustering Data,” Prentice-Hall, Inc., 1988.
[18]D. A. Keim, “Databases and Visualization,” Proc. Tutorial ACM SIGMOD Int. Conf. on Management of Data, Montreal, Canada, 1996, p. 543.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top