(3.236.231.14) 您好!臺灣時間:2021/04/12 00:30
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陸雲泰
研究生(外文):Yun-Tai Lu
論文名稱:一個以Hilbertcurve為基礎的有效支援大型空間資料分群方法
論文名稱(外文):An Efficient Hilbert Curve-basedClustering Strategy for Large Spatial Databases
指導教授:張玉盈
指導教授(外文):Ye-In Chang
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:英文
論文頁數:82
中文關鍵詞:資料分群空間曲線階層式分群方法空間資料探勘格子基礎分群方法
外文關鍵詞:hierarchical clusteringgrid-based clusteringclusteringspace-filling curvespatial data mining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:169
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:4
近來,數以百萬計的資料庫正在被使用,因此我們需要一種新的技術,它能自動的將原始資料轉換成為有用的資訊及知識。資料挖掘就是一種可以從資料庫中分析原始資料,將隱含的、先前並不知道的的資訊從資料庫中萃取出來的技術。而空間資料挖掘是資料挖掘中的一個分支,它主要用在處理空間資料。在空間資料挖掘中,資料分群 (data clustering) 乃是從原始資料中發掘出我們有興趣的資料,是一種非常有用的技術。資料分群是將一群存在 d 維空間的 n 個資料,根據分群的原則,將資料分成 k 個群組。這個原則乃是:讓群組裡的資料相似度達最大,群組間的相似度達最小。群組分析已經被應用在非常多的領域,例如醫藥、社會研究、生物資訊、地圖分析和網際地理資訊系統 (GIS)。在近年來,許多的研究人員專注於找到一個有效的方法,可以快速的解決資料分群的問題。一般來說,我們可以將資料分群的演算法分成四大類:分割式、階層式、密度基礎和格子基礎的分群方法。k-means演算法是屬於分割式分群方法,它是一個非常普遍且廣泛被使用的資料分群方法。但是k-means演算法主要的缺點,乃在於它的分群品質受到開始選擇的參數k很大的影響,並且它只適用於發掘圓形的群組。此外,k-means演算法要花很多的時間來做運算,而無法處理大量的資料。因此,在此篇論文研究中,我們針對處理大量的空間資料庫提出一個新的演算法,它結合了階層式分群的方法和格子基礎的分群方法的資料結構。我們使用格子基礎的分群方法,是因為它可以有效地處理大量的資料。此外,我們使用階層式分群方法,來結合這些格子,來幫助我們找出真實的群組出來。基本上,我們利用了the Hilbert curve以提供一種規則順序來儲存我們的資料。The Hilbert curve是space-filling curve的其中一種,space-filling curve是一種連續的路徑。這些路徑經過在空間中的每一筆資料,並且以一對一的對應關係將高維度的資料轉換成一維的號碼順序。而使用space-filling curve的主要目的是保留空間中的距離關係,若資料在二維空間中相近,則儘量將這些資料存到相近的一維空間磁碟中。這種對應關係可以幫助我們減少磁碟的存取次數和加快資料分群的速度。我們所提出的新演算法只需輸入一個參數,並且能提供使用者一個適當的值。在我們的模擬研究中,我們可以發現我們所提出的演算法在處理大量的資料時,執行時間比其它演算法來的少。而當我們所處理的資料量增加時,我們演算法的執行時間增加的非常緩慢。最後,我們的演算法可以處理一些任意形狀的群組,而這些群組是k-means演算法所不能發掘出來的。
Recently, millions of databases have been used and we need a new technique that can automatically transform the processed data into useful information and knowledge. Data mining is the technique of analyzing data to discover previously unknown information and spatial data mining is the branch of data mining that deals with spatial data. In spatial data mining, clustering is one of useful techniques for discovering interesting data in the underlying data objects. The problem of clustering is that give n data points in a d-dimensional metric space, partition the data points into k clusters such that the data points within a cluster are more similar to each other than data points in different clusters. Cluster analysis has been widely applied to many areas such as medicine, social studies, bioinformatics, map regions and GIS, etc. In recent years, many researchers have focused on finding efficient methods to the clustering problem. In general, we can classify these clustering algorithms into four approaches: partition, hierarchical, density-based, and grid-based approaches. The k-means algorithm which is based on the partitioning approach is probably the most widely applied clustering method. But a major drawback of k-means algorithm is that it is difficult to determine the parameter k to represent ''natural'''' cluster, and it is only suitable for concave spherical clusters. The k-means algorithm has high computational complexity and is unable to handle large databases. Therefore, in this thesis, we present an efficient clustering algorithm for large spatial databases. It combines the hierarchical approach with the grid-based approach structure. We apply the grid-based approach, because it is efficient for large spatial databases. Moreover, we apply the hierarchical approach to find the genuine clusters by repeatedly combining together these blocks. Basically, we make use of the Hilbert curve to provide a way to linearly order the points of a grid. Note that the Hilbert curve is a kind of space-filling curves, where a space-filling curve is a continuous path which passes through every point in a space once to form a one-one correspondence between the coordinates of the points and the one-dimensional sequence numbers of the points on the curve. The goal of using space-filling curve is to preserve the distance that points which are close in 2-D space and represent similar data should be stored close together in the linear order. This kind of mapping also can minimize the disk access effort and provide high speed for clustering. This new algorithm requires only one input parameter and supports the user in determining an appropriate value for it. In our simulation, we have shown that our proposed clustering algorithm can have shorter execution time than other algorithms for the large databases. Since the number of data points is increased, the execution time of our algorithm is increased slowly. Moreover, our algorithm can deal with clusters with arbitrary shapes in which the k-means algorithm can not discover.
ABSTRACT
LIST OF FIGURES
LIST OF TABLES
1. Introduction
1.1 Spatial Data Mining
1.2 Clustering
1.3 Space Filling Curves
1.4 Motivations
1.5 Organization of Thesis
2. Survey
2.1 K-Means
2.2 CLARANS
2.3 HAC
2.4 CURE
2.5 DBSCAN
2.6 STING
3. The Clustering Algorithm Based on the Hilbert Curve
3.1 The Basic Idea
3.2 The Algorithm
3.2.1 The First Round
3.2.3 The Second Round
3.2.3 The Third Round
3.2.4 The Fourth Round
4. Performance
4.1 Performance Measures
4.2 The Simulation Model
4.3 Simulation Results
4.3.1 Time Scalability
4.3.2 Sensitivity to Parameters
4.3.3 Special Data Sets
5. Conclusion
5.1 Summary
5.2 Future Work
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan,
"Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,"
ACM SIGMOD Conf.,
pp. 94-105, 1998.
M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander,
"OPTICS: Ordering Points To Identify the Clustering Structure,"
ACM SIGMOD Conf.,
pp. 49-60, 1999.
A. Ben-Dor, R. Shamir, and Z. Yakhini,
"Clustering Gene Expression Patterns,"
Proc. of the 3rd Annual Int. Conf. on Computational Molecular Biology,
pp. 33-42, 1999.
P. Berkhin,
"Survey of Clustering Data Mining Techniques,"
Accrue Software, Inc.,
2002.
M. S. Chen, J. Han, and P. S. Yu,
"Data Mining: An Overview from Database Perspective,"
IEEE Trans. on Knowledge and Data Eng.,
Vol. 8, No. 6, pp. 866-883, Dec. 1996.
M. Ester, H. P. Kriegel, J. Sander, and X. Xu,
"A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,"
Proc. of the 2nd Int. Conf. on KDD,
pp. 226-231, 1996.
Christos Faloutsos,
"Gray Codes for Partial Match and Range Queries,"
IEEE Trans. on Software Eng.,
Vol. 14, No. 10, pp. 1381-1393, Aug. 1988.
Christos Faloutsos and Shari Roseman,
"Fractals for Secondary Key Retrieval,"
ACM SIGACT-SIGMOD-SIGART Symposium on PODS,
pp. 247-252, 1989.
S. Guha, R. Rastogi, and K. Shim,
"ROCK: A Robust Clustering Algorithm for Categorical,"
Int. Conf. on Data Eng.,
pp. 512-521, 1999.
S. Guha, R. Rastogi, and K. Shim,
"CURE: An Efficient Clustering Algorithm for Large Databases,"
Information Systems,
Vol. 26, No. 1, pp. 35-58, March 2001.
H. V. Jagadish,
"Linear Clustering of Objects with Multiple Attributes,"
ACM SIGMOD Conf.,
pp. 332-342, 1990.
A. K. Jain and R. C. Dubes,
"Algorithms for Clustering Data,"
Prentice Hall, Englewood Cliffs,
New Jersey, 1988.
A. K. Jain, M. N. Murty, and P. J. Flynn,
"Data Clustering: A Review,"
ACM Computing Survey,
Vol. 31, No. 3, pp. 264-323, Sept. 1999.
G. Karypis, E. H. Han, and V. Kumar,
"CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling,"
IEEE Trans. on Computer,
Vol. 32, No. 8, pp. 68-75, Aug. 1999.
L. Kaufman and P. J. Rousseeuw,
"Finding Groups in Data: An Introduction to Cluster Analysis,"
John Wiley & Sons, Inc.,
New York, 1990.
E. Kolatch,
"Clustering Algorithms for Spatial Databases: A Survey,"
Dept. of Computer Science, University of Maryland, College Park,
2001.
B. Lent, A. Swami, and J. Widom,
"Clustering Association Rules,"
Proc. of the 13th Int. Conf. on Data Eng.,
pp. 220-231, 1997.
BongKi Moon, H. V. Jagadish, Christos Faloutsos, and Joel H. Saltz,
"Analysis of the Clustering Properties of the Hilbert Space-Filling Curve,"
IEEE Trans. on Knowledge and Data Eng.,
Vol. 13, No. 1, pp. 124-141, Jan. 2001.
R. T. Ng and J. Han,
"Efficient and Effective Clustering Methods for Spatial Data Mining,"
Proc. of the 20th VLDB Conf.,
pp. 144-155, 1994.
R. T. Ng and J. Han,
"CLARANS: A Method for Clustering Objects for Spatial Data Mining,"
IEEE Trans. on Knowledge and Data Eng.,
Vol. 14, No. 5, pp. 1003-1016, Sept./Oct. 2002.
C. F. Olson,
"Parallel Algorithms for Hierarchical Clustering,"
Parallel Computing,
Vol. 21, No. 8, pp. 1313-1325, Aug. 1995.
Jack A. Orenstein and T. H. Merrett,
"A Class of Data Structures for Associative Searching,"
Proc. Symp. on PODS,
pp. 181-190, 1984.
Jack A. Orenstein,
"Spatial Query Processing in an Object-Oriented Database System,"
Proc. ACM SIGMOD Int. Conf. on Management of Data,
pp. 326-336, 1986.
G. Sheikholeslami, S. Chatterjee, and A. Zhang,
"WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases,"
Proc. of the 24th VLDB Conf.,
pp. 428-439, 1998.
L. Y. Tseng and S. B. Yang,
"A Genetic Approach to the Automatic Clustering Problem,"
Pattern Recognition,
Vol. 34, No. 2, pp. 415-424, Feb. 2001.
E. M. Voorhees,
"Implementing Agglomerative Hierarchical Clustering Algorithms for Use in Document Retrieval,"
Information Proc.& Management,
pp. 465-476, 1986.
W. Wang, J. Yang, and R. Muntz,
"STING: A Statistical Information Grid Approach to Spatial Data,"
Proc. of the 23th VLDB Conf.,
pp. 186-195, 1997.
C. P. Wei, Y. H. Lee, and C. M. Hsu,
"Empirical Comparison of Fast Clustering Algorithms for Large Data Sets,"
Proc. of the 33rd Hawaii Int.~Conf.~on System Sciences,
Maui, Hawaii, Jan. 2000.
O. R. Zaiane, A. Foss, C. H. Lee, and W. wang,
"On Data Clustering Analysis: Scalability, Constraints, and Validation,"
Pacific-Asia Conf. on Knowledge Discovery and Data Mining,
pp. 28-39, 2002.
T. Zhang, R. Ramakrishnan, and M. Livny,
"BIRCH: A Efficient Data Clustering Method for Very Large Databases,"
ACM SIGMOD Int. Conf. on Management of Data,
pp. 103-114, 1996.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊
 
系統版面圖檔 系統版面圖檔