跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.87) 您好!臺灣時間:2025/01/17 19:25
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:鄭凱駿
研究生(外文):Kai-Chun Cheng
論文名稱:應用資料精減於破產預測之研究
論文名稱(外文):Data Reduction in Bankruptcy Prediction
指導教授:蔡志豐蔡志豐引用關係
指導教授(外文):Chih-fong Tsai
學位類別:碩士
校院名稱:國立中正大學
系所名稱:會計與資訊科技研究所
學門:商業及管理學門
學類:會計學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:71
中文關鍵詞:破產預測資料探勘資料精減K-means
外文關鍵詞:bankruptcy prediction、data mining、data reduction
相關次數:
  • 被引用被引用:1
  • 點閱點閱:1017
  • 評分評分:
  • 下載下載:51
  • 收藏至我的研究室書目清單書目收藏:0
過去應用資料探勘技術於破產預測領堿的研究紛紛建構出不同的預測模型,然而在混合式的模型當中,在資料前處理的部分都是著重在屬性篩選上,較少是針對資料精減的研究。適當的資料精減可以淨化訓練用的資料,將較特異且不具代表性的資料刪除,使訓練出來的模型有較好的預測效能。因此本研究運用K-means分群技術可以找出資料群集之中心點的性質,建立了一個資料精減的方法。藉由計算出所有資料與中心點之距離,再按照不同的比例將距離中心點較遠的資料刪除來達到資料精減的效果。我們使用破產預測領域常用的四個公開下載的資料集,在資料精減掉不同比例的資料量後,分別以類神經網路、決策樹、羅吉斯迴歸等四種分類技術來訓練模型。研究結果顯示,按照不同比例精減過的資料集用四種不同的分類技術所訓練出來的模型,其經過交叉驗證後的平均正確率大多高於資料精減前的結果。而隨著資料精減的比例增加,精減掉離中心點距離值較遠的資料,得到較乾淨的資料集後,平均正確率也有上升的趨勢。
Prior researches of using data mining techniques in bankruptcy prediction focus mainly on constructing effective prediction models. Particularly, many of them develop hybrid models based on feature selection in the pre-processing stage. However, very few of them emphasize on data reduction. Data reduction can make the training dataset cleaner and reduce outlier data, which can improve prediction accuracy. Therefore, the purpose of this thesis is to build up a data reduction method by using K-means to find the center of each cluster, and calculate the distance from all the data in a specific cluster to its cluster center. Then, we reduce the farther data in different percentages as the outlier data. We use four commonly used datasets in the bankruptcy prediction domain and employ neural networks, decision trees, logistic regression, and support vector machines as the prediction models after data reduction. The experimental results show that when the model trained by the four classifiers using four datasets after data reduction, the accuracy in general is higher than the model without data reduction. Moreover, the accuracy becomes higher when the reduction percentage increases.
謝 辭 i
摘 要 ii
目 錄 iv
圖目錄 vi
表目錄 vii
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 4
1.3 論文架構 5
1.4 論文流程 6
第二章 文獻探討 7
2.1 破產預測 7
2.2 資料探勘 8
2.2.1 資料探勘之定義 8
2.2.2 資料探勘的方式 9
2.2.3 資料探勘的步驟 11
2.3 前處理(Pre-Processing) 12
2.3.1 資料精減(Data Reduction) 13
2.3.2 屬性篩選(Feature Selection) 14
2.4 分類技術(Classification) 14
2.4.1 類神經網路(Neural Network) 15
2.4.2 決策樹(Decision Tree) 17
2.4.3 支援向量機(Support Vector Machines) 19
2.4.4 羅吉斯迴歸(Logistic Regression) 21
2.5 分群技術(Clustering) 22
2.5.1 K-means 24
2.6 應用資料探勘技術於破產預測之相關文獻 26
第三章 研究方法 33
3.1 研究設計及架構 33
3.2 資料來源 35
3.3 資料精減之流程及方法 36
3.4 預測模型之建立 38
3.4.1 預測模型建立流程 38
3.4.2 10-Fold交叉驗證 39
3.4.3 分類技術之相關設定 40
3.5 模型評估方法 42
第四章 實驗結果與分析 43
4.1 各模型之平均正確率分析與比較 43
4.1.1 Baseline之平均正確率 43
4.1.2 按不同比例資料精減後之平均正確率 44
4.1.3 平均正確率分析 46
4.2 型I、型II錯誤率 50
4.3 延伸討論 59
4.3.1 距離值分析探討 59
4.3.2 各資料集平均正確率最高之精減比例討論與驗證 62
第五章 研究結論與建議 65
5.1 研究結論 65
5.2 研究貢獻與建議 66
5.2.1 研究貢獻 66
5.2.2 未來研究方向與建議 67
參考文獻 68
Alfaro, E., Garcia, N., Gamez, M. and Elizondo, D. 2008. “Bankruptcy forecasting: an empirical comparison of AdaBoost and neural network.” Decision Support Systems.
Altman, E. I. 1968. “Financial ratios, discriminant analysis and the prediction of corporate bankruptcy.” Journal of Finance September, 887-900.
Altman, E. I., Haldeman, R. G., and Narayanan, P. 1977. “ZETATM analysis A new model to identify bankruptcy risk of corporations.” Journal of Banking & Finance 1(1): 29-54.
Alsabti, K., Ranka S. and Sigh, V. 1998. “An Efficient K-means Clustering algorithm,” in Proc. 1st Workshop on High Performance Data Mining, Orlando, FL.
Angelini, E., Tollo, G. D. and Roli, A. 2008. “A neural network approach for credit risk evaluation.” The Quarterly Review of Economics and Finance 48(11): 733-755.
Atiya, A.F. 2001. “Bankruptcy prediction for credit risk using neural networks: a survey and new results.“ IEEE Transactions on Neural Networks 12(4): 929-935.
Beaver, W. H. 1966. “Financial Ratios as Predictors of Failure.” Journal of Accounting Research 4: 71-111.
Bellotti, T. and Crook, J. 2009. “Support vector machines for credit scoring and discovery of significant features.” Expert Systems with Applications 36(3:) 3302-3308.
Berry, M. J. A. and Linoff, G. S. 2004. “Data Mining Techniques.” 2nd Edition. Wiley Publishing, Inc.
Breiman, L., Friedman, J. H., Olshen, R. J., and Stone, C. J. 1984. “Classification and Regression Trees,” Belmont, CA: Wadsworth.
Burges, C. J. C. 1998. “A tutorial on support vector machines for pattern recognition.”Data Mining and Knowledge Discovery 2(2): 955-974.
Chen, J. S., Ching, R. K. H., and Lin, Y. S. 2004. “An extended study of the K-means algorithm for data clustering and its applications.” The Journal of the Operational Research Society 55(9): 976-987.
Cheung, Y. M. 2003. “K-Means: A New Generalized K-means Clustering Algorithm.” Pattern Recognition Letters 24(1): 2883-2893.
Chiu, C.-C. Tien, C. C. and Chou, Y. C. 2005. “Construction of Clustering and Classification Models by Integrating Fuzzy Art, CART and Neural Network Approaches.” Journal of the Chinese Institute of Industrial Engineers 22(2): 171-188.
Chuang C. L. and Lin R. H. 2009. “Constructing a reassigning credit scoring model.” Expert Systems with Applications 36(3): 1685-1694.
Dash, M., Liu, H. and Xu, X. 2001. “`1+1>2'': Merging Distance and Density Based Clustering.” In Proceeding 7th International Conference on Database Systems for Advanced Applications: 32-39. Hong Kong.
Deakin, E. 1972. “A Discriminant Analysis of Predictors of Business Failure.” Journal of Accounting Research: 167-179.
Etemadi, H., Rostamy, A. A. A. and Dehkordi, H. F. 2009. “A genetic programming model for bankruptcy prediction: Empirical evidence from Iran.” Expert Systems with Applications 36(3): 3199-3207.
Fayyad, U., Piatetsky S. G. and Smyth, P. 1996. “From data mining to knowledge discovery: An Overview In Advances in Knowledge Discovery and Data Mining.” American Association for Artificial Intelligence: 1-34.
Foster, G.. 1978. “Financial Statement Analysis.” Englewood Cliffs, New Jersey, Prentice-Hall Inc.
Gestel, T. V., Baesens, B., Suykens, J. A. K., Van den Poel, D., Baestaens, D.-E. and Willekens, M. 2006. “Bayesian kernel based classification for financial distress detection.” European Journal of Operational Research 172: 979-1003.
Han, J. and Kamber, M. 2001. “Data Mining: Concepts and Techniques.” 2nd Edition. Morgan Kaufmann Publishers, USA.
Hoffmann, F., Baesens, B., Mues, C., Gestel, T. V. and Vanthienen, J. 2007. “Inferring descriptive and approximate fuzzy rules for credit scoring using evolutionary algorithms.” European Journal of Operational Research 177: 540-555.
Hsieh, N.-C. 2005. “Hybrid mining approach in the design of credit scoring models.” Expert Systems with Applications 28: 655-665.
Hua, Z., Wang, Y., X, X., Zhang, B. and Liang, L. 2007. “Predicting corporate financial distress based on integration of support vector machine and logistic regression.” Expert Systems with Applications 33: 434-440.
Huang, Z., Chen, H., Hsu, C.-J., Chen, W.-H. and Wu, S. 2004. “Credit rating analysis with support vector machines and neural networks: a makret comparative study.” Decision Support Systems 37: 543-558.
Huang, C.-L., Chen, M.-C. and Wang, C.-J. 2007. “Credit scoring with a data mining approach based on support vector machines.” Expert Systems with Applications 33: 847-856.
Huysmans, J., Baesens, B., Vanthienen, J. and van Gestel, T. 2006. “Failure prediction with self organizing maps.” Expert Systems with Applications 30: 479-487.
Jain, A. K. Mao, J. and Mohiuddin, K. 1996. “Artificial neural network: A tutorial.” IEEE computer 29: 31-44.
Kao, L.J. and Chiu, C.C. 2001. “Mining the Customer Credit by Using the Neural Network Model with Classification and Regression Tree Approach.” IEEE International Conference on Systems, Man and Cybernetics: 923-928.
Kass, G. 1980. “An exploratory technique for investigating large quantities of categorical data.” Applied Statistics 29(2): 119–127.
Kim, M.-J., and Han, I. 2003. “The discovery of experts’ decision rules from qualitative bankruptcy data using genetic algorithms.” Expert Systems with Applications 25: 637-646.
Kleinbaum, D. G., Kupper, L. L., Muller, K. E. and Nizam A. 1998. “Applied Regression Analysis and Multivariable Methods” 3rd Edition. ISBN:0534209106.
Kuri-Morales, A. and Rodríguez-Erazo, F. 2008. “A search space reduction methodology for data mining in large databases.” Engineering Applications of Artificial Intelligence, In Press.
Laitinen, E. K. 1991. “Financial Ratios and Different Failure Processes.” Journal of Business Finance and Accounting 18: 649-673.
Laitinen, E. K. 1999. “Predicting a corporate credit analyst’s risk estimate by logistic and linear models.” International Review of Financial Analysis 8(2): 97–121.
Lee, K., Booth, D. and Alam, P. 2005. “A comparison of supervised and unsupervised neural networks in predicting bankruptcy of Korean firms.” Expert Systems with Applications 29: 1-16.
Lee, T.-S., Chiu, C.-C., Chou, Y.-C. and Lu, C.-J. 2006. “Mining the customer credit using classification and regression tree and multivariate adaptive regression splines.” Computational Statistics and Data Analysis 50: 1113-1130.
Lensberg, T., Eilifsen, A. and McKee, T. E. 2006. “Bankruptcy theory development and classification via genetic programming.” European Journal of Operational Research 169: 677-697.
Li, H. and Sun, J. 2008. “Financial distress prediction based on serial combination of multiple classifiers.” Expert Systems with Applications, In Press, Corrected Proof, Available online 21 Octobe.
Li, X.-B., and Jacob, V. S. 2008. “Adaptive data reduction for large-scale transaction data.” European Journal of Operational Research 188(3): 910-924.
Li, Xiu., Ying, Weiyun., Tuo, Jianyong., Li, Bing. and Liu, Wenhuang. 2004. “Applications of Classification Trees to Consumer Credit Scoring Methods in Commerical Banks.” IEEE International Conference on Systems, Man and Cybernetics: 4112-4117.
Lin, F. Y. and McClean, S. 2001. “A data mining approach to the prediction of corporate failure.” Knowledge-Based System 14: 189-195.
Lin, H. T. and Lin, C. J. 2003. “A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods.” Technical report, Department of Computer Science and Information Engineering, National Taiwan University.
Lin, S.-W., Ying, K.-C., Chen, S.-C., and Lee, Z.-J. 2008. “Particle swarm optimization for parameter determination and feature selection of support vector machines.” Expert Systems with Applications 35(11): 1817-1824.
Luo, S.-T., Cheng, B.-W. and Hsieh C.-H. 2008. “Prediction model building with clustering-launched classification and support vector machines in credit scoring.” Expert Systems with Applications, In Press, Corrected Proof, Available online 23 September.
Malhotra, R. and Malhotra, D. K. 2002. “Differentiating between good credits and bad credits using neuro-fuzzy systems.” European Journal of Operational Research 136: 190-211.
Martens, D., Baesens, B., Gestel, T. V. and Vanthienen, J. 2007. “Comprehensible credit scoring models using rule extraction from support vector machines.” European Journal of Operational Research, In Press.
McQueen, J.B. 1967. “Some Methods of Classification and Analysis of multivariate Observations.” Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability: 281-297.
Min, J. H. and Lee, Y.-C. 2005. “Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters.” Expert Systems with Applications 28: 603-614.
Min, S.-H., Lee, J. and Han, I. 2006. “Hybrid genetic algorithms and support vector machines for bankruptcy prediction.” Expert Systems with Applications 31: 652-660.
Mitra, S., and Acharya, T. 2003. “Data mining multimedia, soft computing and bioinformatics.” Wiley Interscience.
Motoda, H. and Liu, H. 2002. “Feature selection” In Handbook of data mining and knowledge discovery: 208-313. Oxford University Press, Inc.
Nanni, L. and Lumini, A. 2009. “An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring.” Expert Systems with Applications 36(3): 3028-3033.
Ong, C.-S., Huang, J.-J. and Tzeng, G..-H. 2005. “Building credit scoring models using genetic programming.” Expert Systems with Applications 29: 41-47.
Pastena, V. and Ruland, W. 1986. “The merger/bankruptcy alternative.” The Accounting Review 61(2): 288-302.
Piatetsky-Shapiro, G. and Frawley, W. J. 1992. ”Knowledge Discovery in Databases.“ American Association for Artificial Intelligence: 57-70.
Piramuthu, S. 2006. “On preprocessing data for financial credit risk evaluation.” Expert Systems with Applications 30: 489-497.
Pyle, D. 1999. “Data Preparation for Data Mining.” Morgan Kaufmann, San Francisco, CA.
Quinlan, J. R. 1986. “Induction of Decision Tree.” Machine Learning 1(1): 81-106.
Quinlan, J. R. 1993. “C4.5: Programs for Machine Learning.” Morgan Kaufmann, San Mateo, CA.
Roiger, R. J. and Geatz, M. W. 2003. "Data Mining: A Tutorial-based Primer." Addison-Wesley: 35.
Rumelhart, D. E. Hinton, G. E. and Williams, R. J. 1986. “Learning Internal Representations by Error Propagation.” Parallel Distributed Processing: Explorations in the Microstructure of Cognition 1: 318-362.
San, O.M., Huynh, V. and Nakamori, Y. 2004. “An alternative extension of the K-means algorithm for clustering categorical data.” International Journal of Applied Mathematics and Computer Science 14(2): 241-247.
Schőlkopf, B. Burges, C. J. C. and Smola, A. J. 1999. “Introduction to support vector learning, advances in kernel methods-support vector learning.” Cambridge, MA: 1-15.
Shin, K.-S., Lee, T. S. and Kim, H.-J. 2005. “An application of support vector machines in bankruptcy prediction model.” Expert Systems with Applications 28: 127-135.
Sun, L. and Shenoy, P. P. 2007. “Using Bayesian networks for bankruptcy prediction: Some methodological issues.“ European Journal of Operational Research 180: 738-753.
Sung, H. H. and Sang, C. P. 1998. “Application of Data Mining Tools to Hotel Data Mart on the Intranet for Database Marketing.” Expert Systems with Application 15: 1-31.
Tian, J., Zhu, L., Zhang, S. and Liu, L. 2005. “Improvement and parallelism of K-means clustering algorithm.” Tsinghua Science and Technology 10(3): 277-281.
Tsai, C.-F. and Wu, J.-W. 2008 “Using neural network ensembles for bankruptcy prediction and credit scoring.” Expert systems with application 34: 2639-2649.
Tsakonas, A., Dounias, G., Doumpos, M. and Zopounidis, C. 2006. “Bankruptcy prediction with neural logic networks by means of grammar-guided genetic programming.” Expert Systems with Applications 30: 449-461.
Vapnik, V. N. 1995. “The nature of statistical learning theory.” New York: Springer.
West, D., Dellana, S. and Qian, J. 2005. “Neural network ensemble strategies for financial decision applications.” Computers and Operations Research 32: 2543-2559.
Westgaard, S. and van derWijst, N. 2001. “Default probabilities in a corporate bank portfolio: a logistic model approach.” European Journal of Operation Research 135(2): 338–349.
Wu, C.-H., Tzeng, G..-H., Goo, Y.-J. and Fang, W.-C. 2007. “A real-valued genetic algorithm to optimize the parameters of support vector machine for predicting bankruptcy.” Expert Systems with Applications 32: 397-408.
Yang, Y. 2007. “Adaptive credit scoring with kernel learning methods.” European Journal of Operational Research, In Press.
Zhao, h., Shinha A. P., and Ge, W. 2009. “Effects of feature construction on classification performance: An empirical study in bank failure prediction.” Expert Systems with Applications 36(3): 2633-2644.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top