跳到主要內容

臺灣博碩士論文加值系統

(3.235.120.150) 您好!臺灣時間:2021/08/03 06:57
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:連子建
研究生(外文):Tzu-ChienLien
論文名稱:結合混合型離散化和挑選式簡易貝氏特徵選取來改善簡易貝氏分類器正確率之方法
論文名稱(外文):Feature selection methods with hybrid discretizationfor naive Bayesian classifiers
指導教授:翁慈宗翁慈宗引用關係
指導教授(外文):Tzu-Tsung Wong
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:中文
論文頁數:45
中文關鍵詞:簡易貝氏分類器混合型離散化特徵選取挑選式簡易貝氏特徵選取方法
外文關鍵詞:feature selectionhybrid discretizationnaïve Bayesian classifierselective naïve Bayes
相關次數:
  • 被引用被引用:6
  • 點閱點閱:504
  • 評分評分:
  • 下載下載:121
  • 收藏至我的研究室書目清單書目收藏:0
分類的工具當中,簡易貝氏分類器具備運作簡單而且快速的優勢,並且有著良好的分類正確率,因此被人們經常使用。簡易貝氏分類器在運作時,屬性是離散型才可發揮其快速的優點,因此遇到連續型的屬性一般是先經過離散化才可輸入,而混合型離散化藉由找到各連續型屬性較佳的離散型態,相較於屬性全採同一種離散化方法,更能提升分類正確率。特徵選取對於分類同樣十分重要,若能適當地採用便可大幅提升運算效率,挑選式簡易貝氏特徵選取方法是簡易貝氏分類器中最常採用的特徵選取機制,因為其運作原理簡易,挑選之後的結果也確實可以提升分類正確率。在過往的研究當中,將離散化與特徵選取結合的方法已屬少數,由於混合型離散化提出的時間距今仍近,因此結合混合型離散化與特徵選取並未有研究探討。本研究的目的便是嘗試結合混合型離散化與特徵選取,觀察將這二個機制結合後對於分類正確率是否有顯著的提升。本研究提出了三個方法,方法一為特徵選取完之後再行混合型離散化,執行速度快。方法二與方法三大體而言均為先做混合型離散化再做特徵選取,而基於運算效率與分類正確率的考量,兩個方法在設計上有所不同。方法二在混合型離散化時並未將所有的屬性可能全部納入;相對的,方法三付出了較高的運算複雜度,以納入所有可能的屬性組合。經過實驗證明,相較於資料檔的屬性均採用同一種離散方法離散化後再行簡易貝氏特徵選取,本研究提出的三個方法在分類正確率上都有較好的表現,說明了混合型離散化與簡易貝氏特徵選取的結合是有效益的,其中又以方法三的分類表現最佳。
Naïve Bayesian classifier is widely used for classification problems, because of its computational efficiency and competitive accuracy. Discretization is one of the major approaches for processing continuous attributes for naïve Bayesian classifier. Hybrid discretization sets the method for discretizing each continuous attribute individually. A previous study found that hybrid discretization is a better approach to improve the performance of the naïve Bayesian classifier than unified discretization. Selective naïve Bayes, abbreviated as SNB, is an important feature selection method for naïve Bayesian classifiers. It improves the efficiency and the accuracy by reducing redundant and irrelevant attributes. The object of this study is to develop methods composed of hybrid discretization and feature selection, and three methods for this purposed are proposed. Method one that is the most efficient executes hybrid discretization after feature selection. Methods two and three generally perform hybrid discretization first followed by feature selection. Method two transforms continuous attributes without considering discrete attributes, while method three determines the best discretization methods for each continuous attribute by searching all possibilities. The experimental results shows that in general, the three methods with hybrid discretization and feature selection all have better performance than the method with unified discretization and feature selection, and method three is the best.
摘要 I
致謝 III
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 研究流程與架構 3
第二章 文獻回顧 4
2.1 簡易貝氏分類器 4
2.2 連續型屬性的離散化 6
2.3 特徵選取 8
2.3.1離散型特徵選取 8
2.3.2連續型特徵選取 10
2.4 離散化與特徵選取的結合 12
第三章 研究方法 14
3.1 研究方法一 14
3.2 研究方法二 18
3.3 研究方法三 20
3.4 離散化方法 24
3.4.1 十等分區間離散化方法 24
3.4.2 等頻率離散化方法 25
3.4.3 最小化熵值離散化方法 25
3.4.4比例式離散化方法 26
3.5 無母數方法衡量連續型屬性重要性 26
3.6 方法評估 28
第四章 實證研究 30
4.1 資料檔屬性 30
4.2 方法比較 32
4.2.1 先行統一離散化方法 32
4.2.2 先行混合型離散化方法 36
4.2.3 研究方法一與研究方法三 39
4.3 小結 39
第五章 結論與未來發展 40
5.1 結論 40
5.2 未來發展 41
參考文獻 42

Asuncion, A. and Newman, D.J. (2007). UCI machine learning repository http://www.ics.uci.edu/~mlearn/MLRepository.html. Irvine, CA: University of California, School of Information and Computer Science.

Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning, Proceedings of the 9th European Conference on Artificial Intelligence, Stockholm, Sweden, 147-150.

Clark, P. and Niblett, T. (1989). The CN2 induction algorithm, Machine Learning, 3, 261-283.

Domingos, P. and Plazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero one loss, Machine Learning, 29, 103-130.

Dougherty, J., Kohavi, R., and Sahami, M. (1995). Supervised and unsupervised discretization of continuous features, Proceedings of the 12th International Conference on Machine Learning, San Francisco, 192-202.

Fayyad, U. M. and Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambery, France, 1022-1027.

Ferreira, A. and Figueiredo, M. (2011). Unsupervised joint feature discretization and selection. Pattern Recognition and Image Analysis, 6669, 200-207.

John, G. H., Kohavi, R., and Pfleger, K. (1994). Irrelevant features and the subset selection problem. Proceedings of ICML-94, 11th International Conference on Machine Learning, New Bruswick, NJ, 121-129.

John, G., H., and Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, Canada, 338-345.

Kerber, R. (1992). Chimerge: Discretization of numeric attributes. Proceeding of the Tenth National Conference on Artificial Intelligence, San Jose, CA, 123-128.

Kwak, N. and Choi, C. H. (2002). Input feature selection by mutual information based on Parzen window. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 1667-1671.

Langley, P., Iba, W., and Thompson, K. (1992). An analysis of Bayesian classifiers, Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, CA, 223-228.

Langley, P. and Sage, S. (1994). Induction of selective Bayesian classifiers. Proceedings of the UAI-94, 10th International Conference on Uncertainty in Artificial Intelligence, Seattle, WA, 399-406.

Li, Y., Hu, S. J., Yang, W. J., Sun, G. Z., Yao, F. W., and Yang, G. (2009). Similarity-based feature selection for learning from examples with continuous Values. Advances in Knowledge Discovery and Data Mining, Spring, 5476, 957-964.

Liu, H. and Setiono, R. (1995). Feature selection and discretization of numeric attribute. Proceedings of the 7th IEEE International Conference on Tools with Artificial Intelligence, 388-391.

Mejia-Lavalle, M., Morales, E. F., and Rodriguez, G. (2006). Fast feature selection method for continuous attributes with nominal class. Proceedings of 5th Mexican International Conference on Artificial Intelligence (MICAI'06), 142-150.

Peng, H., Long, F., and Ding, C.(2005). Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1226-1238.

Pernkopf, F. (2005). Bayesian network classifiers versus selective k-NN classifier. Pattern Recognition, 38, 1–10.

Pudil, P., Novovicova, J., and Kittler, J. (1994). Floating search methods in feature selection, Pattern Recognition Letters, 15, 1119–1125.

Ribeiro, X. M., Traina, A. J. M., and Traina, C. Jr. (2008). A new algorithm for data discretization and feature selection. Proceedings of the 2008 ACM symposium on Applied computing, New York, USA, 953-954.

Senthilkumar, J., Mnjula, D., and Krishnamoorthy, R. (2009). NANO: A new supervised algorithm for feature selection with discretization. Proceedings of IEEE International conference on Advanced Computing (IACC 2009), Patiala, India, 1515-1520.

Wong, T. T., (2012). A hybrid discretization method for naïve Bayesian classifiers, Pattern Recognition, 45, 2321-2325.

Yang, Y. and Webb, G. I. (2009). Discretization for naive-Bayes learning: managing discretization bias and variance, Machine Learning, 74, 39-74.

Zhang, M. L., Jose, M. P., and Robles, V. (2009). Feature selection for multi-label naïve Bayes classification. Information Sciences: an International Journal, 179, 3218-3229.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top