(3.237.234.213) 您好!臺灣時間:2021/03/09 13:11
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:廖尚儀
研究生(外文):Shan-Yi Liao
論文名稱:針對資料探勘技術的資料型態整合
論文名稱(外文):Data Types Generalization for Data Mining Algorithms
指導教授:曾憲雄曾憲雄引用關係
指導教授(外文):Shian-Shyong Tseng
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:1999
畢業學年度:87
語文別:英文
論文頁數:36
中文關鍵詞:資料探勘資料型態合併階段轉換階段整合
外文關鍵詞:Data MiningData TypesMerging PhaseTransforming PhaseGeneralization
相關次數:
  • 被引用被引用:3
  • 點閱點閱:165
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
隨著資料庫系統應用的日漸增加,近年來資料探勘的重要性也逐漸被重視,各式資料探勘 (data mining) 方法也已經被提出。正如一般所知,資料探勘所處理的資料可以從不同種類的來源獲得,也因此資料的型態可能各不相同。但目前似乎沒有一套資料探勘的方法可以同時適用於所有的應用,因為實際上每種方法都有其適合處理的資料型態。為此使用者在決定要使用何種資料探勘方法時,不僅要考慮到其應用的目標,也需要考慮到資料型態適合與否的問題。 因此,將不適合的資料型態轉換為可適用的資料型態就成為資料探勘領域中的一項重要工作。然而,由於現實中所存在的資料型態過多, 這項工作也就變成十分繁重。將性質相似的資料型態合併成為一個整合式資料型態 (generalized data type)是一個降低這項工作複雜度的好方法。在這篇論文中,一個包含了合併階段和轉換階段的資料型態整合程序被提出來。在合併階段中,資料來源中的各種資料型態首先被合併成為一些整合式資料型態。轉換階段接著將這些整合式資料型態轉換成適合於被選用的資料探勘方法。藉著使用這個資料型態整合程序,使用者可以依照其應用的目標來選擇資料探勘方法,而不用考慮到資料型態的問題。
在這篇論文中,六類常用資料探勘技術的資料型態適合度的問題會被討論並且對其提出一個完整的分析。如何藉著使用這個資料型態整合程序,來解決在關聯式資料庫中的資料型態適合度的問題也會被說明。最後,各種在這個程序中所使用的轉換策略會以演算法的方式加以說明。而一些實例也會被列出來說明這種資料整合程序是可行的。

With the increasing of database applications, mining interesting information from huge databases becomes of most concern and a variety of mining algorithms have been proposed in recent years. As we know, the data processed in data mining may be obtained from many sources in which different data types may be used. However, no algorithm can be applied to all applications due to the difficulty for fitting data types of the algorithm, so the selection of an appropriate mining algorithm is based on not only the goal of application, but also the data fittability. Therefore, to transform the non-fitting data type into target one is also an important work in data mining, but the work is often tedious or complex since a lot of data types exist in real world. Merging the similar data types of a given selected mining algorithm into a generalized data type seems to be a good approach to reduce the transformation complexity. In this work, a data type generalization process including merging and transforming phases is proposed. In the merging phase, the original data types of data sources to be mined are first merged into the generalized ones. The transforming phase is then used to convert the generalized data types into the target ones for the selected mining algorithm. Using the data type generalization process, the user can select appropriate mining algorithm just for the goal of application without considering the data types.
In this thesis, the data types fittability problem for six kinds of widely used data mining techniques will be discussed and a complete analysis of it will be presented. We will also show that with the proposed data types generalization process, users can do data types transformation for the attributes in relations and thus the data fittability problem for relational databases is solved. Finally, we explain different kinds of transformation strategies used in the process by giving concise algorithms for them. We also illustrate examples to show the prototype of the data types generalization process is practical.

1 Introduction 1
1.1 The problem of data fittability for data mining algorithms . . . . . . . . . . . . . . . . .1
1.2 A new concept "data types generalization process" . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Organization of the rest of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
2 The data types generalization process and related work 3
2.1 Six kinds of data mining approaches (background knowledge) . . . . . . . . . . . . . 3
2.2 Data types generalization process vs. preprocessing of data mining . . . . . . . . . 4
2.2.1 The role of data types generalization process in KDD . . . . . . . . . . . . . . . . . . . . . . . .4
2.2.2 The importance of data types generalization process . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 The difference between "data types generalization" and other preprocessing . . . . . . . . .6
2.3 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Data generalization using concept hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Transforming data into numerical values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8
3 Analysis of the data types fittability problem 9
3.1 An overview of the data types fittability problem for data mining . . . . . . . . . ..9
3.2 Four kinds of data forms for data mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Analysis of the six kinds of data mining techniques . . . . . . . . . . . . . . . . . . . . 11
3.4 Analysis of the data types fittablity problem for data mining . . . . . . . . . . . . . 14
3.4.1 Data types fittability between different kinds of databases . . . . . . . . . . . . . . . . . . . .14
3.4.2 Data types fittabilitty for each kind of databases . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 The data types generalization process 16
4.1 General idea of the data types generalization process . . . . . . . . . . . . . . . . . . 16
4.2 The first phase: merging phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
4.2.1 Discrete data type and continuous data type . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.2 Comparison of discrete data type and continuous data type . . . . . . . . . . . . . . . . . . .19
4.3 The second phase: transforming phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
4.3.1 Reducing the distinct values of discrete data type . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.2 Discrete data type to continuous data type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.3 Multi-dimensional enumerated data type to continuous data type . . . . . . . . . . . . . . .23
4.3.4 Continuous data type to discrete data type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
4.4 The detail flow of data types generalization process . . . . . . . . . . . . . . . . . . . . 24
5 The transformation strategies over generalized data types 27
5.1 Strategies of reducing the distinct values of discrete data type . . . . . . . . . . . . 27
5.1.1 Generalization using concept hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27
5.1.2 Generalization using data format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
5.1.3 Generalization using target attribute value . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
5.1.4 Generalization using database statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Strategies of discrete data type to continuous data type . . . . . . . . . . . . . . . . . . 31
5.2.1 Transform small number of distinct values with no obvious ordering . . . . . . . . . . . . 31
5.2.2 Transform small number of distinct values with sequential nature . . . . . . . . . . . . . . 31
5.2.3 Transform data using "Concept Hierarchy" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Strategies of multi-dimensional enumerated type to continuous data type . . . 34
5.4 Strategies of continuous data type to discrete data type . . . . . . . . . . . . . . . . . . 35
6 Concluding remarks 36

[1] R. Agrawal and J. C. Shafer, "Parallel mining of association rules," IEEE Tran. Knowledge and Data Engineering, Vol. 8, No. 6, pp. 962-969, Dec., 1996.
[2] R. Agrawal, T. Imielinski, and A. Swami, "Database mining: a performance perspective," IEEE Tran. Knowledge and Data Engineering, Vol. 5, No. 6, pp. 914-925, Dec., 1993.
[3] C. Apte, S. Weiss, "Data mining with decision trees and decision rules," Future
Generation Computer Systems Vol.13 pp.197-210, 1997.
[4] M.-S. Chen, J. Han and P. S. Yu, "Data Mining: An Overview from Database
Perspective,'' IEEE Trans. on Knowledge and Data Engineering, Vol. 8, No. 6,
pp. 866-883, Dec., 1996.
[5] D.W Cheung, A. W.-C. Fu, and J. Han, "Knowledge discovery in databases: A
rule-based attribute-oriented approach," Proc. 1994 Int'l Symp. on
Methodologies for Intelligent Systems, pp. 164-173, Charlotte, North Carolina,
Oct., 1994.
[6] D. W. Cheung, V. T. Ng, A. W. Fu, and Y. Fu, "Efficient mining of association rules in distributed databases," IEEE Tran. Knowledge and Data Engineering, Vol. 8, No. 6, pp. 911-922, Dec., 1996.
[7] U. Fayyad , P. Stolorz , "Data mining and KDD: Promise and challenges,"
Future Generation Computer Systems Vol.13, pp. 99-115, 1997.
[8] J. Han, "Mining knowledge at multiple concept levels," Proc. 4th Int. Conf. on
Information and Knowledge Management, pp. 19-24, Baltimore. Maryland,
Nov.1995.
[9] J. Han, " Data mining techniques," ACM-SIGMOD'96 CONFERENCE
TUTORIAL, Jun., 1996.
[10] J. Han, Y. Cai, and N. Cercone, "Data-driven discovery of quantitative rules in
relational database," IEEE Tran. Knowledge and Data Engineering, Vol. 5, No. 1,
pp. 29-40, Feb., 1993.
[11] J. Han and Y. Fu, "Dynamic generation and refinement of concept hierarchies for knowledge discovery in databases," Proc. AAAI'94 Workshop on knowledge
Discovery in Databases (KDD'94) ,pp. 157-168, Seattle, Wa, Jul., 1994.
[12] T. B. Ho, "Discovering and using knowledge from unsupervised data," Japan
Advanced Institute of Science and Technology, Tatsunokuchi, Ishikawa, 923-12
Japan.
[13] G. J. Hwang and S. S. Tseng, "EMCUD: A knowledge acquisition method which captures embedded meanings under uncertainty," International Journal of Man Machine Studies, Vol. 33, pp. 431-451, 1990.
[14] G. J. Hwang, "Knowledge elicitation and integration from multiple experts," Journal of Information Science and Engineering, vol. 10, no. 1, pp. 99-109, Mar., 1994.
[15]. A. K. Jain and R. C. Dubes, Algorithms for clustering data, Prentice-Hall Inc., pp. 58-89, 1988.
[16] M. Kamber, L. Winstone, W. Gong, S. Cheng, J. Han, "Generalization and
decision tree induction: efficient classification in data mining," Database
Systems Research Laboratory. School of Computing Science. Simon
Fraser University, B.C., Canada V5A 1S6.
[17]. R. L. Kennedy, Y. Lee, B. V. Roy, C. D.Reed, Dr. R. P. Lippmann, Solving Data
Mining Problems through Pattern Recognition, Section 8.2.
[18] J. P. Yoon and L. Kerschberg, "A framework for knowledge discovery and
evolution in databases," IEEE Tran. Knowledge and Data Engineering, Vol. 5,
No.6, pp. 973-979, Dec., 1993.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔