跳到主要內容

臺灣博碩士論文加值系統

(44.212.99.248) 您好!臺灣時間:2023/01/28 12:48
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林宸翊
研究生(外文):Cheng-I Lin
論文名稱:應用於行為評等之Randomforests及其變數選擇法
論文名稱(外文):Application of the Random Forests and Its Variable Selection Method to the Building of Behavior Rating System
指導教授:黃孝雲黃孝雲引用關係莊瑞珠莊瑞珠引用關係
指導教授(外文):Hsiao-Yun HuangRwei-Ju Chuang
學位類別:碩士
校院名稱:輔仁大學
系所名稱:應用統計學研究所
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:132
中文關鍵詞:行為評等變數選擇random forestswrapper
外文關鍵詞:behavior ratingvariable selectionrandom forestswrapper
相關次數:
  • 被引用被引用:11
  • 點閱點閱:871
  • 評分評分:
  • 下載下載:145
  • 收藏至我的研究室書目清單書目收藏:1
國內於94年爆發雙卡風暴後,銀行呆帳風險增加,信用卡風險控管成為各大金融機構積極加強的目標。目前信用評等研究較偏向於申請者評等,較少針對顧客辦卡後持續追蹤控管的行為評等制度。本研究探討信用貸款之顧客市場區隔,依信用還款行為的等級將顧客區分成四類,分別為正常還款、循環繳息、帳款逾期30~90天、帳款逾期90~180天。針對行為評等制度之建構除了利用顧客申請資料外,另需採納其後續還款資料庫,相對於傳統的申請者評等其資料維度較大,收集成本亦較高,因此如何有效的在眾多維度中選取出顯著變數為經營管理之關鍵性議題。本研究利用一較新興的Random forests演算方法進行行為評等建模,其具有建模容易、相對正確率高且能提供行為變數之重要程度指標優勢。由於Random forests並無變數選擇的機制,故本研究將應用wrapper概念,利用Random forests的重要程度指標,加入一Random forests的變數選擇法,並另外比較四種分類法羅吉斯迴歸、區別分析、類神經網路及支援向量機所建立信用卡顧客之行為評等模型。本研究驗證Random forests演算方法在行為評等制度應用議題中可有效篩選建模變數,且提供較佳分類正確率,並使用決策樹法完成多分類信用行為等級之市場區隔實務運用。
Since the outbreak of the credit crisis 2005, the risks of bank bad debts have increased. The needs of credit card risk management have become one the major issues for almost every financial institution in Taiwan. However, the developments of the credit rating systems are still in its early stage of the applicant ratings. Very few of the advanced behavior rating systems which aim at tracking and managing customers continually after the granting of credit cards has fully constructed. To target the customer markets diversity, our study design credit cards holders into four categories in terms of their payment behaviors: payment on time, paying the interest circularly, late for 30~90 days, and late for 90~180 days. Since constructions of the behavior rating require not only the customers’ basic data structure but also their follow-up transactional data, we need much larger variable dimensions and the cost of information will be heavily increased for the risk management. Therefore, how to choose significant variables among high-dimension data base becomes one of the critical issues. We propose the Random forests as one of the emerging classified method as the advantages of modeling efficiency and accuracy, in addition to the important degree index for variables. However, the Random forests can not provide the process of variable selection. Aiming at this problem, through wrapper conceptions, our study will transfer the index of important degrees to bring up a variable selection algorithm. Logistic Regression, Discriminant Classifier, Artificial Neural Network and Support Vector Machine are also applied to establish the behavior rating models of credit card customers for comparison purposes. In our case study of credit rating system the proposed Random forests method has shown and build an effective modeling variables and better classified occurrence.
第壹章 緒論 1
第一節 研究背景 1
第二節 研究動機 3
第三節 研究目的 4
第四節 研究架構 5
第貳章 文獻探討 7
第一節 信用卡簡介 7
一、信用卡定義 7
二、信用卡起源 7
三、國內信用卡市場現況 8
四、信用卡申辦資料 9
第二節 信用風險管理 10
一、信用風險管理之目的 10
二、信用風險評估之方法 11
第三節 行為評等 14
一、行為評等模型之建構 15
第四節 信用評等模型之相關研究 16
一、線性分類器與二次分類器 16
二、羅吉斯迴歸 17
三、類神經網路 19
四、支援向量機 20
第参章 問題闡述與改進 22
第一節 問題闡述 22
第二節 Random forests 25
第三節 Wrapper Random forests變數選擇法 29
第四節 其他欲比較分類方法 31
一、線性分類器與二次分類器 31
二、羅吉斯迴歸 33
三、類神經網路 35
四、支援向量機 39
第肆章 實證分析 46
第一節 實證分析範例-資料一 46
一、資料一之持卡顧客資料特性 46
二、分類方法之使用 51
三、實證結果 54
四、綜合比較 63
第二節 實證分析範例-資料二 65
一、資料二之持卡顧客資料特性 65
二、分類方法之使用 66
三、實證結果 66
四、綜合比較 75
第三節 行為評等模型之顧客特性介紹 76
一、Wrapper RF模型之判別表 76
二、建立二階段信用風險區隔 79
第伍章 結論與建議 84
第一節 結論 84
第二節 對銀行的建議 85
第三節 對後續研究者的建議 86
一、建構二階段模型 86
二、驗證變數選擇個數 86
參考文獻 87
一、中文文獻 87
二、英文文獻 88
附錄一 各測試子集合於資料一實證結果 94
附錄二 各測試子集合於資料二實證結果 106
附錄三 資料二變數相關定義及資料編碼 114
一、中文文獻
1.行政院金融監督管理委員會。http://www.fscey.gov.tw
2.李玉玲(2008)。AdaBoost法應用與改進-以多變量製程管制及高光譜影像資料為例。天主教輔仁大學應用統計研究所未出版碩士論文,台北縣。
3.何貴清(2002)。消費者小額信用貸款之信用風險研究-以一商業銀行客戶為例。國立中山大學人力資源管理研究所未出版碩士論文,高雄市。
4.邢獻慈(1978)。信用卡有關法律問題之研究。國立中興大學法律研究所未出版碩士論文,台中市。
5.邱筱筠(2008)。應用資料探勘技術於信用卡顧客行為評分分類模式之建構。天主教輔仁大學商學研究所未出版碩士論文,台北縣。
6.馬芳資(1994)。信用卡信用風險預警範例學習系統之研究。國立政治大學資管所未出版碩士論文,台北市。
7.張大成、劉宛鑫、沈大白(2002)。信用評等模型之簡介。中國商銀月刊,21(11),1-5。
8.張文生(2000)。銀行建構「信用卡信用風險及時預警系統」之研究。中原大學企業管理研究所未出版碩士論文,桃園縣。
9.張振志(2007)。資料探勘行為評分分類模式之建構-以某銀行信用卡為例。天主教輔仁大學商學研究所未出版碩士論文,台北縣。
10.莊瑞珠(2007)。邏輯斯迴歸模型運用在女性信用卡評分制度之研究。輔仁管理評論,14(1),127-154。
11.莊瑞珠、陳穆貞(2006)。金融機構住宅房屋貸款信用評分系統之建構研究。住宅學報,15(2),65-90。
12.陳怡妃(2008)。新興分類技術於行為評等模式之建構。天主教輔仁大學商學研究所未出版碩士論文,台北縣。
13.陳建(2005)。信用評分模型技術與應用。北京:中國財政經濟出版社。
14.黃仁壯(2004)。生物聲紋自動辨識。中華大學資訊工程系研究所未出版碩士論文,桃園縣。
15.黃意婷(2008)。信用卡違約風險之研究。天主教輔仁大學應用統計研究所未出版碩士論文,台北縣。
16.劉明進(2005)。高亮度LED外觀瑕疵檢測技術。國立暨南國際大學資訊工程學系研究所未出版碩士論文。
17.龔昶元(1998)。Logistic regression模式應用於信用卡信用風險審核之研究。台北銀行月刊,28(9),35-49。
二、英文文獻
1.Agresti, A. (2002). Categorical Data Analysis, New York: John Wiley & Sons.
2.Bellman, R. E. (1961). Adaptive control processes. Princeton University Press.
3.Breiman, L. (1996). Bagging predictors. Machine Learning 26(2), 123-140.
4.Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.
5.Castillo, F., Marshall, K., Green, J., & Kordon, A. (2003). A methodology for combining symbolic regression and design of experiments to improve empirical model building. Genetic and Evolutionary Compu-tation Conference, 1975–1985.
6.Chung, H. M. & Gray, P. (1999). Guest editors, special section: data mining. Journal of Management Information Systems, 16, 11-16.
7.Craven, M. W. & Shavlik, J. W. (1997). Using neural networks for data mining. Future Generation Computer Systems, 13, 221-229.
8.Desai, V. D., Crook, J. N., & Overstreet, G. A. (1996). A comparison of neural networks and linear scoring models in credit union environment. European Journal of Operations Research, 95(1), 24–37.
9.Desai, V. D., Crook, J. N., & Overstreet, G. A. (1997). Credit-scoring models in the credit-union environment using neural networks and genetic algorithms, IMA Journal of Mathematics Applied in Business and Industry, 8(4), 323-346.
10.Dietterich, T. G. (1998). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning, 1-22.
11.Dietterich, T. G. (2000). Ensemble methods in machine learning. First international workshop of multiple classifier systems, 1857, 1-15.
12.Durand, D. (1941). Risk Elements in Consumer Installment Financing. National Bureau of Economic Research, New York.
13.Feraud, R., & Cleror, F. (2002). A methodology to explain neural network classification. Neural Network, 15(2), 237–246.
14.Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179-188.
15.Freeman, J. A. and Skapura, D. M. (1992). Neural networks algorithms, applications, and programming techniques. MA: Addison-Wesley Publishing Company.
16.Fukunaga, K. (1990). Introduction to statistical pattern recognition. San Diego: Academic Press Inc, 445-459.
17.Hastie, T., Tibshirani, R. and Friedam, J. (2001). The elements of statistical learning: data mining, inference, and prediction, New York: Springer.
18.Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(8), 832-844.
19.Hosmer D. W. & Lemeshow S. (2000). Applied logistic regression, New York: John Wiley & Sons.
20.Hsieh, N. C. (2004). An integrated data mining and behavioral scoring model for analyzing bank customers. Expert Systems with Applications, 27(4), 623-633.
21.Hsu, C. W., Lin, C. C. & Lin C. J. (2003). A Practical Guide to Support Vector Classification, Last updated May. 19, 2009, URL http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.
22.Huang, C. L., Chen, M. C., & Wang C. J. (2007). Credit scoring with a data mining approach based on support vector machines. Expert Systems with Applications, 33(4), 847-856.
23.Hughes, G. F. (1968). On the mean accuracy of statistical pattern recognition, IEEE Trans. Information Theory, 14, 55-63.
24.Hunn, P. (1971). Bank credit in the 1970’s new realities and old verities. The Journal of Commercial Bank Lending, 6, 29-34.
25.Kohavi, R. & John, G. H. (1997). Wrapper for Feature Subset Selection. Artificial Intelligence, 97(1-2), 273-324.
26.Lee, G., Sung, T. K. & Chang, N. (1999). Dynamics of modeling in data mining: interpretive approach to bankruptcy prediction, Journal of Management Information Systems, 16(1), 63-85.
27.Lee, T. S., & Chen, I. F. (2005). A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Systems with Applications, 28(4), 743-752.
28.Lee, T. S., Chiu, C. C., Lu, C. J., & Chen, I. F. (2002). Credit scoring using the hybrid neural discriminant technique. Expert Systems with Applications, 23(3), 245-254.
29.Lee, Y. C. (2007). Application of support vector machines to corporate credit rating prediction. Expert Systems with Applications, 33(1), 67-74.
30.Mathworks Inc. (2009). MATLAB: A language of technical computing. United States of America, URL http://www.mathworks.com/.
31.McCulloch, W. S. & Pitts W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115–133.
32.Ong, C. S., Huang, J. J., Tzeng, G. H. (2005). Building credit scoring models using genetic programming. Expert Systems with Applications, 29(1), 41-47.
33.R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.
34.Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-536.
35.Srinivisan, V., Kim, Y. H. (1987). Credit granting: a comparative analysis of classification procedures, Journal of Finance, 42, 665-683.
36.Statnikov, A., & Aliferis, C. F. (2007). Are Random Forests Better than Support Vector Machines for Microarray-Based Cancer Classification? AMIA Annual Symposium.
37.Svetnik, V., Liaw, A., Tong, C. & Wang, T. (2004). Application of Breiman’s Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules. Lecture Notes in Computer Science, 3077, 334-343.
38.Thomas, L. C. (2000). A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. International Journal of Forecasting, 16(2), 149-172.
39.Uriarte, R. D., & Andres, S. A. (2005). Variable selection from random forests: application to gene expression data. Madrid: Spanish National Cancer Center.
40.Vapnik, V. N. & Lerner, A. (1963). Pattern recognition using generalized portrait method. Automation and Remote Control, 24, 1963.
41.Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer-Verlag.
42.West D. (2000). Neural network credit scoring models. Computers and operations Research, 27(11-12), 1131-1152.
43.Yobas, M. B., Crook, J. N., & Ross, P. (2000). Credit scoring using neural and evolutionary techniques. IMA Journal of Mathematics Applied in Business and Industry, 11, 111-125.
44.Yu, L., Wang, S., Lai, K. K. (2008). Credit risk assessment with a multistage neural network ensemble learning approach. Expert Systems with Applications, 34(2), 1434-1444.
45.Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting, 14(1), 35-62.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top