(44.192.112.123) 您好!臺灣時間:2021/03/08 15:21
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:賴妤函
研究生(外文):Yu-Han Lai
論文名稱:運用高效率多變量分類器於多目標分類問題
論文名稱(外文):Using of High-Efficient Multivariate Classifier for Multi-class Classification Problems
指導教授:林泓毅林泓毅引用關係
指導教授(外文):Hung-Yi Lin
學位類別:碩士
校院名稱:臺中技術學院
系所名稱:流通管理系碩士班
學門:商業及管理學門
學類:行銷與流通學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:中文
論文頁數:51
中文關鍵詞:多變量分析多類別分類屬性選擇屬性提取主成份分析熵值
外文關鍵詞:multivariate analysesmulti-class problemsfeature selectionfeature extractionPCAentropy
相關次數:
  • 被引用被引用:1
  • 點閱點閱:180
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究致力於處理大型資料集之多目標分類問題,或稱為多類別分類問題(multi-class problems over large datasets),我們提出一種新的多類別歸納學習模型,此模型基於屬性評估(feature evaluation)、屬性選擇(feature selection)、屬性提取(feature extraction)及歸納學習(inductive learning)四個階段;在屬性評估階段,傳統資訊獲利(information gain)的設計加入對資料差異(data diversity)之觀察;在屬性選擇階段,相依性分析(correlation analysis)強化關連性分析(relevance analysis),以提供更可靠的屬性選擇機制;屬性提取階段運用多變量分析中的主成份分析(principal component analysis, PCA)建構分類器的基礎;最後再進行歸納學習階段。為了驗證我們的方法,實驗的進行使用五個大型資料集,目標數量4至7個,資料集案例數最高達4000多筆,並採用10倍交叉驗證法以降低資料造成的偏誤。我們實際評估分類器的分類效能、區別能力及執行效率等相關指標,由實驗結果顯示,我們的多變量分類器與C4.5、CART、SVM及NaiveBayes分類器相較之下,均能夠獲得具競爭性的分類效能及區別能力,研究的主要貢獻在於運用我們的多變量分類器可以獲得大幅度的時間效率改良。

This paper proposes a novel classification model in dealing with multi-class problems when confronting large scales of features and instances. Our classification model comprises four stages: feature evaluation, feature selection, feature extraction and inductive learning. In the first stage, the investigation of data diversity enhances the classification effect of Information Gain. The second stage strengthens the relevance analysis by introducing correlation analysis and then serves a more reliable mechanism for feature selection. In the third stage, principal component analysis is applied to generate multivariate classifier. The final stage proceeds inductive learning. In order to verify our methods, five large datasets with class number of 4~7 and thousands of instances are used in our experiments. The assessments of accuracy, discrimination capability and performance are empolyed to evaluate our multivariate classifier as compared to C4.5, CART, SVM and NaiveBayes. In the experimental results, although our classifier performs the similar accuracy and discrimination capability with four conventional classifiers, model training time and classification efficiency is significantly improved.

摘 要 i
ABSTRACT ii
誌 謝 iii
目 錄 iv
表 目 錄 v
圖 目 錄 vi
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 5
第二章 文獻回顧與數學背景 7
2.1. 文獻回顧 7
2.2. 數學背景 14
第三章 研究方法 17
3.1. 系統架構 17
3.2. 屬性評估 19
3.3. 啟發式屬性選擇 21
3.4. 多變量分類器 25
第四章 實證研究 27
4.1. 實驗資料 27
4.2. 多變量分類器產生流程示範(以bodyfat為例) 29
4.3. 實驗結果與綜合分析 33
第五章 研究結論與建議 44
5.1. 研究結論 44
5.2. 未來研究建議 45
研討會之參與及論文發表 46
參考文獻 47



Bellotti, T., & Crook, J. (2009). Support vector machines for credit scoring and discovery of significant features. Expert Systems with Applications, 36(2), pp. 3302-3308.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and Regression Trees. New York: Chapman & Hall (Wadsworth, Inc.), CRC Press.
Castillo, C., Donato, D., Gionis, A., Murdock, V., & Silvestri, F. (2007). Know your neighbors: web spam detection using the web topology. Annual ACM Conference on Research and Development in Information Retrieval, (pp. 423-430). Amsterdam.
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41(3) .
Chen, S.-M., & Shie, J.-D. (2009). Fuzzy classification systems based on fuzzy information gain measures. Expert Systems with Applications, 36, pp. 4517-4522.
Chen, Y.-K., Wang, C.-Y., & Feng, Y.-Y. (2010). Application of a 3NN+1 based CBR system to segmentation of the notebook computers market. Expert Systems with Applications, 37(1), pp. 276-281.
Dua, Y., Belcher, C., Zhoua, Z., & Ives, R. (2010). Feature correlation evaluation approach for iris feature quality measure. Signal Processing, 90, pp. 1176-1187.
El-Yaniv, R., Pechyony, D., & Yom-Tov, E. (2008). Better Multiclass Classification via a Margin-Optimized Single Binary Problem. Pattern Recognition Letters.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27 , pp. 861-874.
Finlay, S. (2011). Multiple classifier architectures and their application to credit risk assessment. European Journal of Operational Research, 210, pp. 368-378.
Garcı´a-Pedrajas, N., & Ortiz-Boyer, D. (2006). Improving Multiclass Pattern Recognition by the Combination of Two Strategies. IEEE transactions on pattern analysis and machine intelligence, 28(6).
Gini, C. (1912). Variabilità e mutabilità. Reprinted in Memorie di metodologica statistica. (S. T. Pizetti E, Ed.) Rome: Libreria Eredi Virgilio Veschi.
Hartigan, J. (1975). Clustering Algorithms. New York: John Wiley and Sons.
Hayward, J., Alvarez, S. A., Ruiz, C., Sullivan, M., Tseng, J., & Whalen, G. (2010). Machine learning of clinical performance in a pancreatic cancer database. Artificial Intelligence in Medicine, 49, pp. 187-195.
Kabira, M. M., Islamb, M. M., & Murase, K. (2010). A new wrapper feature selection approach using neural network. Neurocomputing, 73, pp. 3273-3283.
Kim, H.-W., Chan, H. C., & Gupta, S. (2007). Value-based Adoption of Mobile Internet: An empirical investigation. Decision Support Systems, 43(1), pp. 111-126.
Kumar, S. P., Sriraam, N., Benakop, P., & Jinaga, B. (2010). Entropies based detection of epileptic seizures with artificial neural network classifiers. Expert Systems with Applications, 37, pp. 3284-3291.
Kwak, N., & Lee, J.-W. (2010). Feature extraction based on subspace methods for regression problems. Neurocomputing, 73, pp. 1740-1751.
Li, S.-T., Kuo, S.-C., & Tsai, F.-C. (2010). An intelligent decision-support model using FSOM and rule extraction for crime prevention. Expert Systems with Applications, 37, pp. 7108-7119.
Liu, H., & Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Boston: Kluwer Academic.
Luh, G.-C., & Chun-Yi, L. (2010). PCA based immune networks for human face recognition. Applied Soft Computing, 11(2), pp. 1743-1752.
Lutu, P. E., & Engelbrecht, A. P. (2010). A decision rule-based method for feature selection in predictive data mining. Expert Systems with Applications, 37, pp. 602-609.
Maldonado, S., Weber, R., & Basak, J. (2011). Simultaneous feature selection and classification using kernel-penalized support vector machines. Information Science. Information Sciences, 18, pp. 115-128.
Ngai, E., Hu, Y., Wong, Y., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems, 50, pp. 559-569.
Oztekin, A., Delen, D., & Kong, Z. (2009). Predicting the graft survival for heart–lung transplantation patients: An integrated data mining methodology. International Journal of Medical Informatics, 78, pp. 84-96.
Qi, X., & Davison, B. D. (2009). Web page classification: Features and algorithms. ACM Computing Surveys (CSUR), 41(2).
Quinlan, J. (1993). C4.5: Programs for machine learning. CA: Morgan Kaufmann.
Quinlan, J. (1979). Discovering rules by induction from large collections of examples. In D. Michie (Ed.). (pp. 168-201). Edinburgh: Edinburgh University Press.
Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1, pp. 81-106.
Sabeti, M., Katebi, S., & Boostani, R. (2009). Entropy and complexity measures for EEG signal classification of schizophrenic and control participants. Artificial Intelligence in Medicine, 47, pp. 263-274.
Shannon, C. E. (1949). A mathematical theory of communication. Bell system technical journal, 27(3), pp. 379-423 & 623-656.
Šušteršič, M., Mramor, D., & Zupan, J. (2009). Consumer credit scoring models with limited data. Expert Systems with Applications, 36(3), pp. 4736-4744.
Tsai, C.-F. (2009). Feature selection in bankruptcy prediction. Knowledge-Based systems, 22, pp. 120-127.
Tsai, C.-F., & Hsiao, Y.-C. (2010). Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches. Decision Support Systems, 50, pp. 258-269.
Wang, C.-M., & Huang, Y.-F. (2009). Evolutionary-based feature selection approaches with new criteria for data mining: A case study of credit approval data. Expert Systems with Applications, 36, pp. 5900-5908.
Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J. Y., Q., M. H., McLachlan, G., et al. (2008). Top 10 algorithms in data mining. Knowledge and information systems, 14(1), pp. 1-37.
Yanga, W., Lib, D., & Zhua, L. (2011). An improved genetic algorithm for optimal feature subset selection from multi-character feature set. Expert Systems with Applications, 38, pp. 2733-2740.
Yen, J. Y., Huang, P. C., & Wan, S. (2009). Modifications on base isolation design ranges through entropy-based classification. Expert Systems with Applications, 36, pp. 4915-4922.
Yusta, S. C. (2009). Different metaheuristic strategies to solve the feature selection problem. Pattern Recognition Letters, 30, pp. 525-534.



QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔