跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.34) 您好!臺灣時間:2025/10/31 01:56
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:伍碧那
研究生(外文):Bi-NaWu
論文名稱:適用於不同分類器的混合型離散化方法
論文名稱(外文):A hybrid discretization method for classification algorithms
指導教授:翁慈宗翁慈宗引用關係
指導教授(外文):Tzu-Tsung Wong
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2014
畢業學年度:102
語文別:中文
論文頁數:49
中文關鍵詞:混合型離散化網路最佳化模型動態規劃分類器
外文關鍵詞:Classifierdynamic programminghybrid discretization methodnetwork optimization model
相關次數:
  • 被引用被引用:2
  • 點閱點閱:178
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
分類是資料探勘領域處理資料的一種方法,根據資料的屬性,經過運算處理而得到每筆資料的分類結果。大多數資料檔內的屬性都包含了連續型屬性,在適用於離散型屬性的分類器中,一般會先將連續型屬性進行離散化動作,將資料轉換為離散型屬性。因此,離散化方法的挑選有可能影響到分類器的分類預測的效果。混合型離散化將連續型屬性個別進行離散化動作,來搜尋最適合的離散化方法,相較於將同一資料檔內的屬性皆採用同一種離散化方法來說,更能提升分類正確率。在混合型離散化的文獻中,主要研究適用於簡易貝氏分類器上,並且須採用分類結果來判定最適合的離散化方法,無法在資料前置處理步驟立即完成所有的離散化動作,因此本研究的目的在於建立出一個適用於其它處理離散型屬性的分類器的混合型離散化方法,且在資料前置處理步驟時即可完成所有的離散化動作。本研究將結合作業研究中的網路最佳化問題,並將混合型離散化問題轉換成網路最佳化模型圖,再根據屬性之間以及屬性與類別值的相關性作為評估指標,使用動態規劃來找出一條最佳的路徑,此路徑亦代表著最適合的混合離散化方法。本研究使用20個資料檔分別使用決策樹、簡易貝氏分類器與基於規則分類器進行分類驗證,相較於使用統一離散化方法,混合型離散化方法在放入簡易貝氏分類器與基於規則分類器時,大部分的資料檔的分類正確率皆有所提升,在決策樹的分類結果則是混合型離散化方法與統一離散化方法的結果差不多,因此本研究之研究方法在挑選混合離散化組合上是可行的。
Discretization is one of the major approaches for processing continuous attributes for classifiers. Hybrid discretization sets the method for discretizing each continuous attribute individually. A previous study found that hybrid discretization method is a better approach to improve the performance of naïve Bayesian classifier than unified discretization. That approach determines the discretization method for each attribute based on whether accuracy can be improved or not. The objectives of this study is to develop a hybrid discretization method applicable for classifiers such that it can determine the discretization method for each attribute in data preprocessing step instead of using accuracy. This study will first build a network optimization models based on the association among attributes and the class. Dynamic programming is then employed to find the optimal solution for the network, and this solution indicates the discretization method for each continuous attribute. The classification tools for testing our methods are decision trees, naïve Bayesian classifiers, and rule-based classifiers. The experimental results on 20 data sets show that the computational cost of our method is low, and that in general, the hybrid discretization method have a better performance in naïve Bayesian classifiers and rule-based classifiers, but not in decision trees.
摘要 I
誌謝 V
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 研究架構 2
第二章 文獻回顧 3
2.1離散化方法 3
2.2屬性與類別值之關係 6
2.3 動態規劃 9
第三章 研究方法 11
3.1 研究流程 11
3.2 連續型屬性排序 12
3.3 離散化方法 14
3.3.1 等寬度離散化方法 15
3.3.2 等頻率離散化方法 15
3.3.3 比例式離散化方法 16
3.3.4 最小熵值離散化方法 16
3.4 網路最佳化模型 17
3.4.1建構網路模型圖 17
3.4.2相關性衡量 18
3.5 動態規劃 21
3.6 分類器 25
3.6.1 決策樹 25
3.6.2 簡易貝氏分類器 25
3.6.3 基於規則的分類器 26
3.6.4 K等分交叉驗證法 27
第四章 實證研究 28
4.1 資料檔介紹 28
4.2 動態規劃求解之結果 29
4.3 分類結果之驗證 31
4.3.1 決策樹 32
4.3.2 簡易貝氏分類器 34
4.3.3 基於規則分類器 36
4.3.4 分類正確率驗證小結 38
4.4 統一離散化方法之分類驗證 39
4.4.1決策樹 39
4.4.2 簡易貝氏分類器 40
4.4.3 基於規則分類器 41
4.5 小結 43
第五章 結論與未來發展 45
5.1 結論 45
5.2 未來發展 46
參考文獻 47

Ballesteros, A. J. T., Martínez, C. H., Riquelme, J. C., and Ruiz, R. (2013). Feature selection to enhance a two-stage evolutionary algorithm in product unit neural networks for complex classification problems. Neurocomputing, 114, 107–117.
Bellman, R. (1957). Dynamic Programming. Princeton, Princeton University Press.
Cannas, L. M., Dessi, N., and Pes, B. (2013). Assessing similarity of feature selection techniques in high-dimensional domains. Pattern Recognition Letters, 34, 1446–1453.
Concepción, M. Á. Á. D. L., Abril, L. G., Morillo, L. M. S., and Ramírez, J. A. O. (2013). An adaptive methodology to discretize and select features. Artificial Intelligence Research, 2( 2), 77-86.
Fayyad, U. M. and Irani, K. B. (1993). Multi-interval discretization of continuous-Valued attributes for classification learning. The 13th International Joint Conference on Artificial Intelligence (IJCAI), 1022-1029.
García, S., Luengo, J., Sáez, J. A., López, V., and Herrera, F. (2013). A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering, 25(4), 734-750.
Golding, D., Nelwamondo, F. V., and Marwala, T. (2013). A dynamic programming approach to missing data estimation using neural networks. Information Sciences, 237, 49–58.
Gu, Q., Li, Z., and Han, J. (2012). Generalized fisher score for feature selection. The 27th Conference on Uncertainty in Artificial Intelligence (UAI), Barcelona, Spain, arXiv preprint arXiv,1202.3725.
Hu, Q., Pedrycz, W., Yu, D., and Lang, J. (2010). Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, 40(1), 137-150.
Jiang, S. Y., Li, X., Zheng, Q., and Wang, L. X. (2009). Approximate equal frequency discretization method. Intelligent Systems, GCIS '09. WRI Global Congress, 3, 514-518.
Li, M., Deng, S. B., Feng, S., and Fan, J. (2011). An effective discretization based on Class-Attribute Coherence Maximization. Pattern Recognition Letters, 32, 1962–1973.
Liu, H., Sun, J., Liu, L., and Zhang, H. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42, 1330-1339.
Jung, Y. G., Kim, K. M., and Kwon, Y. M. (2012). Using weighted hybrid discretization method to analyze climate changes. Computer Applications for Graphics, Grid Computing, and Industrial Environment. Springer Berlin Heidelberg, Communications in Computer and Information Science, 351, 189–195.
Lustgarten, J. L, Visweswaran, S., Gopalakrishnan1, V., and Cooper, G. F. (2011). Application of an efficient Bayesian discretization method to biomedical data. BMC Bioinformatics, 12, 309.
Park, C. E. and Lee, M. (2009). A SVM-based discretization method with application to associative classification. Expert Systems with Applications, 36, 4784–4787
Pisica, I., Taylor, G., and Lipan, L. (2013). Feature selection filter for classification of power system operating states. Computers and Mathematics with Applications, 66, 1795–1807.
Sakar, C. O., Kursun, O., and Gurgen, F. (2012). A feature selection method based on kernel canonical correlation analysis and the minimum Redundancy–Maximum Relevance filter method. Expert Systems with Applications, 39, 3432–3437.
Sang, Y., Jin, Y., Li, K., and Qi, H. (2013). UniDis: a universal discretization technique. Journal of Intelligent Information Systems, 40, 327–348.


Shen, C. C. and Chen, Y. L. (2008). A dynamic-programming algorithm for hierarchical discretization of continuous attributes. European Journal of Operational Research, 184, 636–651.
Tian D., Zeng, X. J., and Keane, J. (2011). Core-generating approximate minimum entropy discretization for rough set feature selection in pattern classification. International Journal of Approximate Reasoning, 52 , 863–880.
Wong, T. T. (2012). A hybrid discretization method for naive Bayesian classifiers. Pattern Recognition, 45, 2321–2325.
Yu, L. and Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. Proceedings of the Twentieth International Conference on Machine Learning, Washington DC, 856-863.
Zhao, J., Han, C. Z., Wei, B., and Han, D. Q. (2012). A UMDA-based discretization method for continuous attributes. Advanced Materials Research, 403-408, 1834-1838.
Zou, L., Yan, D., Karimi, H. R., and Shi, P. (2013). An algorithm for discretization of real value attributes based on interval similarity. Journal of Applied Mathematics, 1-8.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top