跳到主要內容

臺灣博碩士論文加值系統

(216.73.217.103) 您好!臺灣時間:2026/06/02 19:41
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張耿維
研究生(外文):Keng-Wei Chang
論文名稱:改良式質樸貝氏分類器分類混合型資料
論文名稱(外文):Extended Naïve Bayes Classifier for Mixed Data
指導教授:許中川許中川引用關係
指導教授(外文):Chung-Chian Hsu
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:資訊管理系碩士班
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:中文
論文頁數:49
中文關鍵詞:兩母體平均數假設檢定分類離散化混合型資料質樸貝氏分類器
外文關鍵詞:statistic testmixed dataclassificationdiscretizationNaïve Bayes Classifier
相關次數:
  • 被引用被引用:0
  • 點閱點閱:408
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
目前的分類演算法中,質樸貝氏分類器(本文簡稱貝氏分類器)歸納式演算法顯示在許多分類的問題上有良好的正確率,其分類任務以屬性獨立條件為假設前提下,考量了許多屬性顯示的聯合資訊作為最終的預測依據,雖然此假設是不符合現實情況,但研究證實貝氏分類器仍然能有效解決分類問題。在執行效率上,貝氏分類器在測試運算是相當快速的,只需掃描整個資料集一遍即可進行預測。雖然上述貝氏分類器擁有諸多優點,但傳統貝氏分類器演算法適用範圍被局限於種類型態的資料。換句話說,貝氏分類器是無法應用於包含種類型和數值型資料上的分類。過去有一些論文使用貝氏分類器時,先將數值型屬性離散化轉成種類型,然而此法很容易因為離散化準則的差異而影響分類績效;接續也有一些論文提出採用高斯分配可以處理數值型資料,但往往因為以一點估計母體分配容易造成錯誤估計機率偏高的問題,因此傳統貝氏分類器仍無法合理處理混合型資料的分類。本研究延伸傳統貝氏分類器於處理種類和數值混合型資料,稱為改良式貝氏分類器。我們透過統計學中的理論,處理數值型屬性的分類機率,成功地發展一套能夠處理混合型資料的演算法。實驗結果證明本研究提出的貝氏分類器與分類迴歸樹、決策樹、多層感知機三種分類演算法相較之下有不錯的分類績效和執行效率。
Naive-Bayes induction algorithm was previously shown to be surprisingly accurate on many classification tasks even when the conditional independence assumption on which they are based is violated. Naïve-Bayes classifiers are very robust to irrelevant attributes, and classification takes into account evidence from many attributes to make the final prediction. Naïve-Bayes classifiers are generally easy to understand and the induction of these classifiers is extremely fast, requiring only a single pass through the data. However, the algorithm is limited to categorical or discrete data. In other words, the classification of mixed data, which includes categorical and numeric data, is inapplicable. Traditional method for dealing with numeric data is to discretize numeric attributes data into symbols. However, the difference of distinct discretized criteria has significant effect on performance. Moreover, several researches has recently employed the normal distribution to handle numeric data, but using only one value to estimate the population easily leads to the incorrect estimation. Hence, the research for classification of mixed data using Naïve-Bayes classifiers is not very successful. In this paper, we propose a classification method, extended Naïve-Bayes(ENB), which is capable of handling mixed data. For categorical data, we utilize the original approach in the Naïve-Bayes algorithm to calculate the probabilities of categorical values. For continuous data, we adopt the statistical theory, in which we not only take the average into account but also consider the variance of numeric values. For an unknown input pattern, the product of the probabilities and the P-values are calculated and then the class which results in the maximum product is designated as the target class to which the input pattern belongs. The experimental results have demonstrated the efficiency of our algorithm in comparison with other classification algorithms like CART, DT, MLP’s and NBG..
摘要........................................................................................................... i
Abstract ...................................................................................................... ii
誌謝........................................................................................................... iii
目錄........................................................................................................... iv
表目錄......................................................................................................... vi
圖目錄......................................................................................................... vii
一、 緒論...................................................................................................... 1
1.1 研究背景及動機............................................................................................. 1
1.2 研究目的................................................................................................... 1
1.3 研究範圍及限制............................................................................................. 2
1.4 論文架構................................................................................................... 2
二、 文獻探討.................................................................................................. 3
2.1 分類演算法................................................................................................. 3
2.1.1 分類迴歸樹............................................................................................... 3
2.1.2 決策樹................................................................................................... 4
2.1.3 多層感知機............................................................................................... 5
2.2 質樸貝氏分類器............................................................................................. 7
2.3 分類演算法比較............................................................................................. 8
2.4 數值型離散化............................................................................................... 9
2.5 數值型高斯分配............................................................................................. 10
三、 研究方法.................................................................................................. 11
3.1 問題定義................................................................................................... 12
3.2 種類型採用傳統貝氏分類器................................................................................... 12
3.3 數值型採用兩母體平均數相等之假設檢定....................................................................... 13
3.4 P-值混合型貝氏分類器....................................................................................... 14
3.5 演算法步驟................................................................................................. 16
3.6 指數混合型貝氏分類器....................................................................................... 18
3.7 運作流程範例............................................................................................... 19
3.8 評估分類結果............................................................................................... 20
四、 實驗...................................................................................................... 22
4.1 P-值混合型貝氏分類器參數實驗............................................................................... 23
4.2 實驗結果................................................................................................... 24
4.3 分類結果正確錯誤比例分析................................................................................... 29
4.4 分類績效穩定性分析......................................................................................... 32
4.5 執行效率分析............................................................................................... 34
五、 結論與未來研究............................................................................................ 37
5.1 結論....................................................................................................... 37
5.2 未來研究................................................................................................... 38
參考文獻....................................................................................................... 39
1.Brause, R., T. Langsdorf and M. Hepp “Neural data mining for credit card fraud detection,” In Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence, pp.103-106, 1999.
2.Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J., Classification and Regression Trees, The Wadsworth and Brooks, 1984.
3.Catlett, J, “Mega-induction: Machine learning on very large databases,” Ph.D. Thesis, University of Sydney, 1991.
4.Castie, E., Gutierjm, J. M. and Ali S. Hadi, Expert Systems and Probabilistic Network Models, Springer-Verlag, New York, 1997.
5.Cheeseman, P. and J. Stutz “Bayesian classification (AutoClass): theory and results. In: Advances in Knowledge Discovery and Data Mining, ” edited by Fayyad UM, Piatetsky-Shapiro G, Smyth P, and Uthurusamy R. AAAI Press/MIT Press, 1996. [The software is available at http://ic-www.arc.nasa.gov/ic/projects/bayes-group /autoclass/index.html]
6.Egmont-Petersen, M., A. Feelders and B. Baesens, “Confidence intervals for probabilistic network classifiers,” Computational Statistics & Data Analysis, 2004.
7.Fayyad, U. and Irani, K., “Multi-interval discretization of continuous-valued attributes for classification learning,” In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp.1022-1027, 1993.
8.Han, J. and Kamber, M., Data mining concepts and techniques, San Francisco, Morgan Kaufmann, 2001.
9.Ian, H. W. and Eibe, F., Data Mining-Practical Machine Learning Tool and Techniques with Java Implementations, Morgan Kaufmann, 2000.
10.Kao, L. J. and C. C. Chiu, “Mining the Customer Credit by Using the Neural Network Model with Classification and Regression Tree Approach,” IFSA World Congress and 20th NAFIPS International Conference, Joint 9th, Vol. 2, pp. 25-28, Jul. 2001.
11.Kerber, R., “Chimerge: Discretization of numeric attributes,” In AAAI-92, Proceedings Ninth National Conference on Artificial Intelligence, AAAI Press/The MIT Press, pp. 123-128, 1992.
12.Lee, T. S., C. C. Chiu, Y. C. Chou and C. J. Lu, “Mining the customer credit using classification and regression tree and multivariate adaptive regression splines,” Computational Statistics & Data Analysis, Dec. 2004.
13.Lee, T. S. and I F. Chen, “A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines,” Expert Systems with Applications, Vol. 28, Issue 4, pp. 743-752, May 2005.
14.Li, F., J. Xu, Z. T. Dou and Y. L. Huang, “Data mining-based credit evaluation for users of credit card,” Machine Learning and Cybernetics, Proceedings of 2004 International Conference on Vol. 4, pp. 26-29, Aug. 2004.
15.Liu, H., and Setiono, R., “Chi2: Feature selection and discretization of numeric attributes,” Proceedings of the seventh IEEE international conference on tools with AI, pp. 388–391, 1995.
16.Merz, C.J. and Murphy, P., UCI repository of ML databases, http://www.ics.uci.edu/~mlearn/MLRepository.html, 1996.
17.Quinlan, J. R., “Inductionj of decision tree,” Machine Learning, 1, 81-106, 1986.
18.Quinlan, J. R., C4.5: Programs for machine learning, San Mateo, CA: Morgan Kaufmann, 1993.
19.Simon, H.,“Neural Networks:A Comprehensive Foundation,”pp. 156-251, 1999.
20.Srinivas, B., “Performance Evaluation of Supertagging for Partial Parsing,” In Proceedings of the Fifth International Workshop on Parsing Technologies, 187–198, Cambridge, 1997.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊