臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.217.103) 您好！臺灣時間：2026/06/02 19:41

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

張耿維

研究生(外文):

Keng-Wei Chang

論文名稱:

改良式質樸貝氏分類器分類混合型資料

論文名稱(外文):

Extended Naïve Bayes Classifier for Mixed Data

指導教授:

許中川

指導教授(外文):

Chung-Chian Hsu

學位類別:

碩士

校院名稱:

國立雲林科技大學

系所名稱:

資訊管理系碩士班

學門:

電算機學門

學類:

電算機一般學類

論文種類:

學術論文

論文出版年:

2005

畢業學年度:

語文別:

中文

論文頁數:

中文關鍵詞:

兩母體平均數假設檢定、分類、離散化、混合型資料、質樸貝氏分類器

外文關鍵詞:

statistic test、mixed data、classification、discretization、Naï、ve Bayes Classifier

相關次數:

被引用:0
點閱:408
評分:
下載:0
書目收藏:0

目前的分類演算法中，質樸貝氏分類器（本文簡稱貝氏分類器）歸納式演算法顯示在許多分類的問題上有良好的正確率，其分類任務以屬性獨立條件為假設前提下，考量了許多屬性顯示的聯合資訊作為最終的預測依據，雖然此假設是不符合現實情況，但研究證實貝氏分類器仍然能有效解決分類問題。在執行效率上，貝氏分類器在測試運算是相當快速的，只需掃描整個資料集一遍即可進行預測。雖然上述貝氏分類器擁有諸多優點，但傳統貝氏分類器演算法適用範圍被局限於種類型態的資料。換句話說，貝氏分類器是無法應用於包含種類型和數值型資料上的分類。過去有一些論文使用貝氏分類器時，先將數值型屬性離散化轉成種類型，然而此法很容易因為離散化準則的差異而影響分類績效；接續也有一些論文提出採用高斯分配可以處理數值型資料，但往往因為以一點估計母體分配容易造成錯誤估計機率偏高的問題，因此傳統貝氏分類器仍無法合理處理混合型資料的分類。本研究延伸傳統貝氏分類器於處理種類和數值混合型資料，稱為改良式貝氏分類器。我們透過統計學中的理論，處理數值型屬性的分類機率，成功地發展一套能夠處理混合型資料的演算法。實驗結果證明本研究提出的貝氏分類器與分類迴歸樹、決策樹、多層感知機三種分類演算法相較之下有不錯的分類績效和執行效率。

Naive-Bayes induction algorithm was previously shown to be surprisingly accurate on many classification tasks even when the conditional independence assumption on which they are based is violated. Naïve-Bayes classifiers are very robust to irrelevant attributes, and classification takes into account evidence from many attributes to make the final prediction. Naïve-Bayes classifiers are generally easy to understand and the induction of these classifiers is extremely fast, requiring only a single pass through the data. However, the algorithm is limited to categorical or discrete data. In other words, the classification of mixed data, which includes categorical and numeric data, is inapplicable. Traditional method for dealing with numeric data is to discretize numeric attributes data into symbols. However, the difference of distinct discretized criteria has significant effect on performance. Moreover, several researches has recently employed the normal distribution to handle numeric data, but using only one value to estimate the population easily leads to the incorrect estimation. Hence, the research for classification of mixed data using Naïve-Bayes classifiers is not very successful. In this paper, we propose a classification method, extended Naïve-Bayes(ENB), which is capable of handling mixed data. For categorical data, we utilize the original approach in the Naïve-Bayes algorithm to calculate the probabilities of categorical values. For continuous data, we adopt the statistical theory, in which we not only take the average into account but also consider the variance of numeric values. For an unknown input pattern, the product of the probabilities and the P-values are calculated and then the class which results in the maximum product is designated as the target class to which the input pattern belongs. The experimental results have demonstrated the efficiency of our algorithm in comparison with other classification algorithms like CART, DT, MLP’s and NBG..

摘要........................................................................................................... i
Abstract ...................................................................................................... ii
誌謝........................................................................................................... iii
目錄........................................................................................................... iv
表目錄......................................................................................................... vi
圖目錄......................................................................................................... vii
一、緒論...................................................................................................... 1
1.1 研究背景及動機............................................................................................. 1
1.2 研究目的................................................................................................... 1
1.3 研究範圍及限制............................................................................................. 2
1.4 論文架構................................................................................................... 2
二、文獻探討.................................................................................................. 3
2.1 分類演算法................................................................................................. 3
2.1.1 分類迴歸樹............................................................................................... 3
2.1.2 決策樹................................................................................................... 4
2.1.3 多層感知機............................................................................................... 5
2.2 質樸貝氏分類器............................................................................................. 7
2.3 分類演算法比較............................................................................................. 8
2.4 數值型離散化............................................................................................... 9
2.5 數值型高斯分配............................................................................................. 10
三、研究方法.................................................................................................. 11
3.1 問題定義................................................................................................... 12
3.2 種類型採用傳統貝氏分類器................................................................................... 12
3.3 數值型採用兩母體平均數相等之假設檢定....................................................................... 13
3.4 P-值混合型貝氏分類器....................................................................................... 14
3.5 演算法步驟................................................................................................. 16
3.6 指數混合型貝氏分類器....................................................................................... 18
3.7 運作流程範例............................................................................................... 19
3.8 評估分類結果............................................................................................... 20
四、實驗...................................................................................................... 22
4.1 P-值混合型貝氏分類器參數實驗............................................................................... 23
4.2 實驗結果................................................................................................... 24
4.3 分類結果正確錯誤比例分析................................................................................... 29
4.4 分類績效穩定性分析......................................................................................... 32
4.5 執行效率分析............................................................................................... 34
五、結論與未來研究............................................................................................ 37
5.1 結論....................................................................................................... 37
5.2 未來研究................................................................................................... 38
參考文獻....................................................................................................... 39

1.Brause, R., T. Langsdorf and M. Hepp “Neural data mining for credit card fraud detection,” In Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence, pp.103-106, 1999.
2.Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J., Classification and Regression Trees, The Wadsworth and Brooks, 1984.
3.Catlett, J, “Mega-induction: Machine learning on very large databases,” Ph.D. Thesis, University of Sydney, 1991.
4.Castie, E., Gutierjm, J. M. and Ali S. Hadi, Expert Systems and Probabilistic Network Models, Springer-Verlag, New York, 1997.
5.Cheeseman, P. and J. Stutz “Bayesian classification (AutoClass): theory and results. In: Advances in Knowledge Discovery and Data Mining, ” edited by Fayyad UM, Piatetsky-Shapiro G, Smyth P, and Uthurusamy R. AAAI Press/MIT Press, 1996. [The software is available at http://ic-www.arc.nasa.gov/ic/projects/bayes-group /autoclass/index.html]
6.Egmont-Petersen, M., A. Feelders and B. Baesens, “Confidence intervals for probabilistic network classifiers,” Computational Statistics & Data Analysis, 2004.
7.Fayyad, U. and Irani, K., “Multi-interval discretization of continuous-valued attributes for classification learning,” In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp.1022-1027, 1993.
8.Han, J. and Kamber, M., Data mining concepts and techniques, San Francisco, Morgan Kaufmann, 2001.
9.Ian, H. W. and Eibe, F., Data Mining-Practical Machine Learning Tool and Techniques with Java Implementations, Morgan Kaufmann, 2000.
10.Kao, L. J. and C. C. Chiu, “Mining the Customer Credit by Using the Neural Network Model with Classification and Regression Tree Approach,” IFSA World Congress and 20th NAFIPS International Conference, Joint 9th, Vol. 2, pp. 25-28, Jul. 2001.
11.Kerber, R., “Chimerge: Discretization of numeric attributes,” In AAAI-92, Proceedings Ninth National Conference on Artificial Intelligence, AAAI Press/The MIT Press, pp. 123-128, 1992.
12.Lee, T. S., C. C. Chiu, Y. C. Chou and C. J. Lu, “Mining the customer credit using classification and regression tree and multivariate adaptive regression splines,” Computational Statistics & Data Analysis, Dec. 2004.
13.Lee, T. S. and I F. Chen, “A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines,” Expert Systems with Applications, Vol. 28, Issue 4, pp. 743-752, May 2005.
14.Li, F., J. Xu, Z. T. Dou and Y. L. Huang, “Data mining-based credit evaluation for users of credit card,” Machine Learning and Cybernetics, Proceedings of 2004 International Conference on Vol. 4, pp. 26-29, Aug. 2004.
15.Liu, H., and Setiono, R., “Chi2: Feature selection and discretization of numeric attributes,” Proceedings of the seventh IEEE international conference on tools with AI, pp. 388–391, 1995.
16.Merz, C.J. and Murphy, P., UCI repository of ML databases, http://www.ics.uci.edu/~mlearn/MLRepository.html, 1996.
17.Quinlan, J. R., “Inductionj of decision tree,” Machine Learning, 1, 81-106, 1986.
18.Quinlan, J. R., C4.5: Programs for machine learning, San Mateo, CA: Morgan Kaufmann, 1993.
19.Simon, H.,“Neural Networks：A Comprehensive Foundation,”pp. 156-251, 1999.
20.Srinivas, B., “Performance Evaluation of Supertagging for Partial Parsing,” In Proceedings of the Fifth International Workshop on Parsing Technologies, 187–198, Cambridge, 1997.

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	應用PSO演算法於分類法則之探勘
2.	應用資料探勘技術於升學考試之分析
3.	最佳服務中心點數目及位置決策資訊系統
4.	中文郵件過濾系統特徵選取之效度探討
5.	運用貝氏模式建構慢性下背痛風險評估系統
6.	以群集技術支援文件類別整合之研究
7.	以數位口碑為基礎之流行性商品銷售預測
8.	基於服飾特徵之資訊檢索研究
9.	可調適即時性視訊傳輸機制之設計
10.	應用延伸學習向量量化分類混合型與種類型資料
11.	吉尼係數離散化演算法
12.	改良式向量量化分析混合型資料
13.	改良式自適結構自組映射圖應用於分類與分群
14.	應用擴充式自組映射圖分析混合型資料
15.	具適應性的情境知覺教室設計實作

無相關期刊

1.	自動分群處理混合型資料之自組映射圖
2.	軟計算探勘用藥知識－以心血管疾病為例
3.	應用新穎性偵測於事件偵測與追蹤
4.	運用內容管理機制以輔助教學活動之研究
5.	中文縮寫詞研究
6.	應用TopicMaps理論建置知識索引於線上新聞事件檢索研究
7.	以認知價值觀點探討消費者對行動加值服務付費之意願行動加值服務付費之意願
8.	利用本體論語言OWL建構以知識管理為基礎網路日誌之研究
9.	以知識本體為基礎適用於C2C電子商務之語意搜尋架構－以線上拍賣網站為例
10.	以知識本體為基礎之網路服務動態組合
11.	以貝氏定理為基礎於垃圾郵件過濾之研究
12.	聯合資訊科技投資事件多宣告資訊移轉效果對同業異常報酬之研究
13.	微波因子對含銅污泥穩定處理之研究
14.	領域本體論之應用—以南投縣信義鄉豐丘社區防救災知識庫形成方法為例—
15.	運用網路服務整合無線射頻辨識系統安全架構之研究

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室