跳到主要內容

臺灣博碩士論文加值系統

(44.222.134.250) 您好!臺灣時間:2024/10/08 05:24
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:洪振富
研究生(外文):Zhen-fu Hong
論文名稱:距離式特徵於資料自動分類之研究
論文名稱(外文):Distance Features in Automatic Data Classification
指導教授:蔡志豐蔡志豐引用關係
指導教授(外文):Chih-fong Tsai
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:中文
論文頁數:68
中文關鍵詞:分群分類特徵萃取
外文關鍵詞:feature extractionclassificationclustering
相關次數:
  • 被引用被引用:14
  • 點閱點閱:257
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:3
在資料探勘和模式識別中,特徵萃取和特徵展現是一個非常重要的步驟,因為萃取的特徵對於分類的精準度有直接且重大的影響。在過去的文獻中,有許多的新的特徵萃取方法和特徵展現已經被提出。然而其中許多的方法只注重在特定領域的問題。在本研究中,我們介紹一種新的特徵萃取方法,利用距離的概念應用於不同的模式分類問題。具體而言,有三種距離會被萃取,這是根據資料和其所屬群中心的距離和資料與非所屬群中心的距離。實驗的過程是利用十個不同的資料集,其包含不同數目的類別,樣本和維度來加以實驗。實驗結果利用Bayes,SVM和KNN三種分類器來測詴資料集,其資料集是連接原始維度和距離式特徵,除了圖形相關的資料集外,此距離式特徵可以提昇分類精準度。尤其此距離式特徵更適合在類別數少,樣本數較少並低維度的狀況下。另外,我們利用具有這相似特點的二個資料集進一步驗證這一結論。結果顯示符合原先的實驗結果,也就是增加距離式特徵之後可以提昇分類性能。
In data mining and pattern classification, feature extraction and representation is a very important step since the extracted features have a direct and significant impact on the classification accuracy. In literature, numbers of novel feature extraction and representation methods have been proposed. However, many of them only focus on specific domain problems. In this thesis, we introduce a novel distance based feature extraction method for various pattern classification problems. Specifically, three distances are extracted, which are based on the distance between the data and its intra-cluster center and the distance between the data and its extra-cluster centers. Experiments based on ten datasets containing different numbers of classes, samples, and dimensions are examined. The experimental results using naïve Bayes, k-NN, and SVM classifiers show that concatenating the original features provided by the datasets to the distance based features can improve classification accuracy except image related datasets. In particular, the distance based features are suitable for the datasets which have smaller numbers of classes, numbers of samples, and the lower dimensionality of features. Moreover, two datasets, which have similar characteristics, are further used to validate this finding. The result is consistent with the first experiment result that adding the distance based features can improve the classification performance.
中文摘要 i
英文摘要 ii
目錄 iii
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的與預期效益 3
1.4 範圍與假設 3
1.5 研究步驟 4
第二章 文獻探討 6
2.1 資料探勘(Data Mining) 6
2.1.1 資料探勘的定義 6
2.1.2 資料探勘的過程 8
2.1.3 資料探勘的功能 9
2.2 資料探勘相關技術 12
2.2.1 分群(Clustering) 12
2.2.1.1 K-means分群演算法 13
2.2.2 分類 17
2.2.2.1 支援向量機 17
2.2.2.2 K-最鄰近鄰居(K Nearest Neighbor, KNN) 19
2.3 特徵選取(Feature Selection) 22
2.3.1 主成份分析(Principal Component Analysis, PCA) 24
2.4 相關文獻 25
第三章 研究方法 28
3.1 研究流程 28
3.2 計算群中心點與群內外距離 30
3.2.1 Euclidean distance公式 30
3.2.2 群內距離 30
3.2.3 群外距離 31
3.3 訓練與測試 32
第四章 實驗結果 33
4.1 實驗設計 33
4.1.1 資料集 33
4.1.2 分類器 34
4.1.3 K褶交叉驗證 34
4.1.4 PCA用於特徵選取 36
4.2 PCA檢定 36
4.3 分類正確率 37
4.4 結果討論 44
4.5 結果驗證 45
第五章 結論與建議 47
5.1 結論 47
5.2 未來展望與建議 48
5.3 研究限制 48
參考文獻 50
附錄A 54
中文部分
林嘉陞,2009,CANN:一個整合分群中心與最鄰近鄰居之入侵偵測系統,國立中正大學,碩士論文。
英文部分
Baralis, E., Chiusano, S., 2004. Essential classification rule sets, ACM Transactions on Database Systems (TODS), Vol. 29, No. 4, pp.635-674.
Belkin, N.J., Croft, W.B., 1992. Information filtering and information retrieval: two sides of the same coin, Communications of the ACM, Vol. 35, No. 12, pp.29-38.
Berry, M.J., Linoff, G., 1997. Data Mining Techniques: for Marketing, Sales, and Customer Support. NY: John Wiley & Sons.
Blum, A., Langley, P., 1997. Selection of relevant features and examples in machine learning, Artificial Intelligence, Vol. 97, No. 1-2, pp.245-271.
Burges, C.J.C., 1998. A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, Vol. 2, No. 2, pp.121-167.
Campbell, J. B., 2003. Introduction to remote sensing, 3rd Edition, London: Taylor and Francis.
Canbas, S., Cabuk, A., Kilic S.B., 2005. Prediction of commercial bank failure via multivariate statistical analysis of financial structures: The Turkish case, European Journal of Operational Research, pp.528-546.
Chang, C.C., Lin, C.J., 2001. LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S., 2002. Choosing multiple parameters for support vector machines. Machine Learning, Vol. 46, pp.131-159.
Chen, Y., Garcia, E.K., Gupty, M.R., Rahimi, A., Cazzanti, L., 2009. Similarity-based Classification: Concepts and Algorithms, Journal of Machine Learning Research, Vol. 10, pp.747-776.
Choi, E., Lee, C., 2003. Feature extraction based on the Bhattacharyya distance, Pattern Recognition, Vol. 36, pp.1703-1709.
Cover, T.M., Hart, P.E., 1967. Nearest neighbor pattern classification, IEEE Transactions on Information Theory, Vol. 3, pp.21-27.
Dash, M., Liu, H., Xu, X., 2001. ''1+1>2'': Merging Distance and Density Based Clustering, Proceeding 7th International Conference on Database Systems for Advanced Applications, pp.32-39.
De la Torre, F., Black, M.J., 2001. Robust principal component analysis for computer vision, International Conference on Computer Vision, pp. 362-369.
Duda, R.O., Hart, P.E., Stork, D.G., 2001. Pattern Classification, 2nd Edition, John Wiley, New York.
Dy, J.G., Brodley, C.E., 2004. Feature selection for unsupervised learning, Journal of Machine Learning Research, Vol. 5, pp.845-889.
Fayyad, U.M., Piatesky, S.G., Smyth, P., 1996. From Data Mining to Knowledge Discovery in Databases, AI Magazine, pp.37-54.
Foody, G. M., Mathur, A., 2004. A relative evaluation of multiclass image classification by support vector machines, IEEE Transactions on Geoscience and Remote Sensing, Vol. 42, pp.1335-1343.
Frawley, W. J., Piatetsky-Shapiro, G. S., Matheus, C. J., 1991. Knowledge Discovery in Databases: An Overview, Knowledge Discovery in Database, AAAI Press, Menlo Park, CA, pp.1-27.
Guha, S., Rastogi R., Shim K., 1998. CURE: An Efficient Clustering Algorithm for Large Databases, Published in the Proceedings of the ACM SIGMOD Conference, pp.73-84.
Han, J., Kamber, M., 2001. Data Mining: Concepts and Techniques, 2nd Edition., Morgan Kaufmann Publishers, USA.
Hand, D., Mannila, H., Smyth, P., 2001. Principles of Data Mining, MIT Press, Cambridge, MA.
Hotelling, H., 1933. Analysis of a Complex of Statistical Variables into Principal Components, J. Educational Psychology, Vol. 24, pp. 498-520.
Jain, A.K., Duin, R.P.W., Mao, J., 2000. Statistical pattern recognition: a review, IEEE Transitions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 1, pp.4-37.
James, M., 1985. Classification algorithms, John Wiley & Sons, Inc.
Keerthi, S., Chapelle, O., DeCoste, D., 2006. Building support vector machines with reducing classifier complexity, Journal of Machine Learning Research, Vol. 7, pp.1493-1515.
Kohavi, R., 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Vol. 2, pp.1137-1145.
Koller, D., Sahami, M., 1996. Toward optimal feature selection, Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284–292.
Kudo, M., Masuyama, N., Toyama, J., Shimbo, M., 2003. Simple termination conditions for k-nearest neighbor method, Pattern Recognition Letters, Vol. 24, pp.1203-1213.
Liu, H., Motoda, H., 1998. Feature Selection for Knowledge Discovery and Data Mining, Boston: Kluwer Academic Publishers.
Liu, Y., Zheng, Y.F., 2006. FS_SFS:Anovel feature selection method for support vector machines, Pattern Recognition, Vol. 39, pp.1333-1345.
MacQueen, J.B., 1967. Some methods for classification and analysis of multivariate observations, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, CA, pp.281-297.
Min, S.H., Lee, J., Han, I., 2006. Hybrid genetic algorithms and support vector machines for bankruptcy prediction, Expert Systems with Applications, Vol. 31, pp.652-660.
Mingoti, S.A., Lima, J.O., 2006. Comparing SOM neural network with Fuzzy c-means, K-means and traditional hierarchical clustering algorithms, European Journal Of Operation Research, Vol. 174, No. 3, pp.1742-1759.
Oh, I.S., Lee, J.S., Moon, B.R., 2004. Hybrid Genetic Algorithms for Feature Selection, Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, No. 11, pp.1424-1437.
Patcha, A., Park, J.M., 2007. An overview of anomaly detection techniques: Existing solution and latest technological trends. Computer Networks, Vol. 51, pp.3448-3470.
Pearson, K., 1901. On Lines and Planes of Closest Fit to System of Points in Space, Philosophical Magazine, Vol. 2, pp. 559-572.
Pechenizkiy, M., Puuronen, S., Tsymbal, A., 2006. The impact of sample reduction on pca-based feature extraction for supervised learning. Proceedings of the Symposium on Applied Computing (SAC), pp. 553-558.
Richard, J.R., Michael, W.G., 2003. Data Mining A Tutorial-Based Primer. Addison-Wesley.
San, O.M., Huynh, V., Nakamori, Y., 2004. An alternative extension of the K-means algorithm for clustering categorical data, International Journal of Applied Mathematics and Computer Science, Vol. 14, pp.241-247.
Smola, A.J., 1998. Learning with kernels, Ph.D. thesis, Technische Universitat Berlin.
Stone, M., 1974. Cross-Validatory Choice and Assessment of Statistical Predictions, Journal of the Royal Statistical Society B, Vol. 36, No. 1, pp.111-147.
Sung, H.H., Sang, C.P., 1998. Application of Data Mining Tools to Hotel Data Mart on the Intranet for Database Marketing, Expert Systems with Application, Vol. 15, pp.1-31.
Tian, J., Zhu, L., Zhang, S., Liu, L., 2005. Improvement and parallelism of K-means clustering algorithm, Tsinghua Science and Technology, Vol. 10, No. 3, pp.277-281.
Tsai, C.F., Lin, C.Y., 2010. A triangle area based nearest neighbors approach to intrusion detection, Pattern Recognition, Vol. 43, pp. 222-229.
Turk, M., Pentland, A., 1991. Eigenfaces for recognition, Journal of Cognitive Neuroscience, Vol. 3, pp.71-86.
Vapnik, V.N., 1995. The Nature of Statistical Learning Theory. Springer, New York.
Xue, Z., Li, S.Z., Teoh, E.K., 2003. Bayesian shape model for facial feature extraction and recognition, Pattern Recognition, Vol. 36, pp.2819-2833.
Yand, J.H., Honavar, V., 1998. Feature Subset Selection Using a Genetic Algorithm, IEEE Intelligent Systems, Vol. 13, No. 2, pp. 44-49.
Zeng, W., Meng, X.X., Yang, C.L., Huang, L., 2006. Feature extraction for online handwritten characters using Delaunay triangulation, Computers & Graphics, Vol. 30, pp.779-786.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top