(18.206.177.17) 您好!臺灣時間:2021/04/11 02:57
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:齊玉美
研究生(外文):Yu-Meei Chyi
論文名稱:不對稱性分類分析之研究
論文名稱(外文):Classification Analysis Techniques for Skewed Class
指導教授:魏志平魏志平引用關係
指導教授(外文):Chih-Ping Wei
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊管理學系研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:中文
論文頁數:49
中文關鍵詞:分群式多專家分類器資料探勘隨機多專家分類器決策樹非對稱性分配分類分析
外文關鍵詞:Data MiningClassification AnalysisSkewed Class Distribution ProblemClustering-based Multi-classifier Class-combinerDecision Tree InductionMulti-classifier Class-combiner Approach
相關次數:
  • 被引用被引用:16
  • 點閱點閱:229
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
中文摘要
雖然資料探勘中之分類分析技術針對類別分佈對稱的資料集合可以建構出具有良好分類效能的分類預測模式,然而在實務的運用上(如流失客戶預測與信用卡詐欺偵測),資料集合卻常有類別資料分佈極不平均的「非對稱性分配」(Skewed Distribution)問題,使得分類預測模式無法針對量少的目標資料進行正確類別預測。多專家分類器、減少多數法及增加少數法是目前文獻中用以解決資料集合的非對稱性分配問題的三種主要的方法。本研究將利用資料分群法改良文獻中的多專家分類器而提出分群式多專家分類器的建構法,並嘗試利用最近距離法、最遠距離法、最近平均距離法及最遠平均距離法改善文獻中減少多數法對「非對稱性分配」問題的處理效能。
本研究收集了燒燙傷醫療資料及精品量販店客戶消費資料兩個具有「非對稱性分配」問題的實際資料集合並採用以決策樹為基礎的分類器,測試本研究所提出用以解決「非對稱性分配」問題五種方法的分類效能,並以文獻中的多專家分類器建構法作為比較基準。利用十次取樣驗證實驗的實驗結果顯示,在兩個收集得的資料集合上,採用類別調整適當比例(如1:2)的分群式多專家法所建構的分類預測模式具有最佳的分類效能。

關鍵字:資料探勘、分類分析、非對稱性分配、決策樹、隨機多專家分類器、分群式多專家分類器。
Abstract
Existing classification analysis techniques (e.g., decision tree induction, backpropagation neural network, k-nearest neighbor classification, etc.) generally exhibit satisfactory classification effectiveness when dealing with data with non-skewed class distribution. However, real-world applications (e.g., churn prediction and fraud detection) often involve highly skewed data in decision outcomes (e.g., 2% churners and 98% non-churners). Such a highly skewed class distribution problem, if not properly addressed, would imperil the resulting learning effectiveness and might result in a “null” prediction system that simply predicts all instances as having the majority decision class as the training instances (e.g., predicting all customers as non-churners). In this study, we extended the multi-classifier class-combiner approach and proposed a clustering-based multi-classifier class-combiner technique to address the highly skewed class distribution problem in classification analysis. In addition, we proposed four distance-based methods for selecting a subset of instances having the majority decision class for lowering the degree of skewness in a data set. Using two real-world datasets (including mortality prediction for burn patients and customer loyalty prediction), empirical results suggested that the proposed clustering-based multi-classifier class-combiner technique generally outperformed the traditional multi-classifier class-combiner approach and the four distance-based methods.

Keywords: Data Mining, Classification Analysis, Skewed Class Distribution Problem, Decision Tree Induction, Multi-classifier Class-combiner Approach, Clustering-based Multi-classifier Class-combiner Approach
目 錄
第一章 緒論 7
第一節 研究背景 7
第二節 研究動機與目的 10
第三節 論文結構 11
第二章 文獻探討 12
第一節 分類分析技術探討 12
第二節 解決非對稱資料分類預測的相關方法 17
第三章 非對稱資料分類預測方法之改良 20
第一節 分群式多專家分類器 20
第二節 距離式減少多數資料挑選法 21
第四章 實證評估 24
第一節 資料搜集 24
第二節 評估準則與程序 28
第三節 實證結果分析─燒燙傷醫療資料 31
一、多專家模式之分類器的最佳對稱比例選擇 31
二、距離式減少多數法之最佳對稱比例選擇 33
三、分類器的效能比較分析 35
第四節 實證結果分析─量販精品資料 36
一、多專家模式之分類器的最佳對稱比例選擇 36
二、距離式減少多數法之最佳對稱比例選擇 38
三、分類器的效能比較分析 40
第五章 結論 42
第一節 綜合結論與貢獻 42
第二節 研究限制 43
第三節 未來研究方向 43
中文文獻:
[彭文正01]彭文正譯,Michael J.A. Berry以及 Gordon S. Linoff著,「Data Mining資料採礦 客戶關係管理暨電子行銷之應用」,數博網資訊股份有限公司,2001。
[蔣博文01]蔣博文,「DATA數位行銷」,英德瑞國際股份有限公司,No.4 (2001/7∼8月份)。
[張勳騰99]張勳騰,「通信資料庫之資料探勘:目標行銷之應用」,國立中山大學資訊管理研究所碩士論文,1999年。
[邱義堂01]邱義堂,「通信資料庫之資料探勘:客戶流失預測之研究」,國立中山大學資訊管理研究所碩士論文,2001年。
[許哲銘99]許哲銘,「時間序列型態之知識探索」,國立中山大學資訊管理研究所碩士論文,1999年。
[林龍樹00]林龍樹,「用戶流失率評估方法與流程介紹」,中華電信研究所,2001年4月。
[楊傑能01]楊傑能,「一個找尋型態鑑別問題決策邊界區域的新方法」,國立中山大學機械工程研究所碩士論文,2001年。
[葉怡成98] 葉怡成,「類神經網路模式應用與實作」,儒林圖書有限公司,1998年1月。
[龔良明98]龔良明,「衍生性群集分析方法之探訂:理論與應用」,國立中山大學資訊管理研究所碩士論文,1998年。
[IBM97]IBM,「資料探挖-找出隱藏在資料庫中的寶藏」,資訊傳真周刊,256期,1997年8月,pp.24。
[陳文華99]陳文華,「應用資料倉儲系統建立CRM」,資訊與電腦,1999年5月,pp.122-127。
[張德民99] 張德民,「資料探勘:從搜尋金星火山到偵察考試作弊」,資訊傳真周刊,336期,1999年3月,pp.10。

英文文獻:
[AIS93]Agrawal, R., Imielinski, T. and Swami, A., “Mining Association Rules Between Sets of Items in Large Databases,” Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington DC, 1993, pp.207-216.
[AS94]Agrawal, R. and Srikant, R., “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, 1994, pp.487-499.
[AS95]Agrawal, R. and Srikant, R., “Mining Sequential Patterns: Generalizations and Performance Improvements,” Research Report RJ 9994, IBM Almaden Research Center, San Jose, California, Dec, 1995.
[AS95T]Agrawal, R. and Srikant, R., “Mining Sequential Patterns,” Proceedings of 1995 International Conference on Data Engineering, Taipei, Taiwan, March 1995.
[BL97]Berry, M. J. A. and Linoff, G., Data Mining Techniques: For Marketing Sale and Customer Support, John Wiley & Sons, Inc., 1997.
[CFPS99]Chan, P. K., Fan, W., Prodromidis, A. L., and Stolfo, S. J., “Distributed Data Mining in Credit Card Fraud Detection,” IEEE Intelligent Systems, Vol. 14, No. 6, 1999, pp.67-74.
[CHC01]Chae, Y. M., Ho, S. H., Cho, K. W., Lee, D. H., and Ji, S. H., “Data Mining Approach to Policy Analysis in A Health Insurance Domain,” International Journal of Medical Informatics, Vol. 62, No.2-3, July 2001, pp.103-111.
[CN89]Clark, P. and Niblett, T., “The CN2 Induction Algorithm,” Machine Learning, Vol. 3, 1989, pp.261-283.
[DBB91]DeRouin, E., Brown, J., Beck, H., Fausett, L., and Schneider, M., “Neural Network Training on Unequally Represented Classes,” Intelligent Engineering Systems Through Artificial Neural Networks, C. H. Dagli, S. R. T. Kumara, and Y. C. Shin (Eds.), ASME Press, New York, 1991, pp.135-145.
[E93]Everitt, B. S., Cluster Aanlysis, John Wiiley & Sons, Inc., 1993.
[EM97]Estivill-Castro, V. and Murray, A. T., “Spatial Clustering for Data Mining with Generic Algorithms,” Technical Report FIT-TR-97-10, Queensland University of Technology, Faculty of Information Management, September 1997.
[F96]Frederick, E., R., “Learning from Customer Defections,” Harvard Business Review, March 1996.
[H68]Hart, P. E., “The Condensed Nearest Neighbor Rule,” IEEE Transactions on Information Theory, IT-14, 1968, pp.515-516.
[HFT95]Han, J., Fu, Y. and Tang, S., “Advances of the DBLearn System for Knowledge Discovery in Large Databases,” Proc. of 1995 Int’l Joint Conf. on Artificial Intelligence (IJCAI’95), Montreal, Canada, Aug, 1995, pp.2049-2050.
[HMH97]Honda T., Motizuki H., Ho T. B. , and Okumura M. , “Generating Decision Trees from an Unbalanced Data Set,” Poster papers presented at the 9th European Conference on Machine Learning (ECML), edited by Maarten van Someren and Gerhard Widmer, 1997, pp 68-77.
[JD88]Jain, A. K. and Dubes, R. C., Algorithms for Clustering Data, Prentice-Hall, Inc., 1988.
[K89]Kohonen, T., Self-Organization and Associative Memory, Springer, 1989.
[K95]Kohonen, T., Self-Organizing Maps, Springer, 1995.
[KM97]Kubat, M. and Matwin, S., “Addressing the Curse of Imbalanced Training Sets: One-Sided Selection,” Proceedings of the 14th International Conference on Machine Learning, 1997.
[KR90]Kaufman, L. and Rousseeuw, P. J., “Finding Groups in Data: An Introduction to Cluster Analysis,” John Wiley & Sons, Inc.,New York, NK, 1990.
[NH94]Ng, R. and Han, J., “Efficient and Effective Clustering Methods for Spatial Data Mining,” Proceedings of the 20th Conference on Very Large Data Bases, Santiago, Chile, 1994.
[PR01]Peppers, D. and Rogers, M., One to One B2B: Customer Development Strategies for the Business-to-Business World, Cahners Business Information, Inc., 2001.
[Q86]Quinlan, J. R., “Induction of Decision Tree,” Machine Learning, Vol. 1, 1986, pp.81-106.
[Q93]Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993.
[RHW86]Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning Internal Representations by Error Propagation,” In Parallel Distributed Processing: Explorations in the Microstructures of Cognition, Vol. 1, D. E. Rumelhart and J. L. McClelland (Eds.), MIT Press, Cambridge, MA, 1986, 318-362.
[SSF96] Salvatore, J., Stolfo, D., Fan, W., Lee, W. and Prodromidis, A. L., “Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results,” 1996.
[TS98]Thomas, S. and Sarawagi, S., "Mining Generalized Association Rules and Sequential Patterns Using SQL Queries,” Proc. of the 4th Int''l Conference on Knowledge Discovery in Databases and Data Mining, New York, Aug, 1998.
[T76]Tomek, I., “Two Modifications of CNN,” IEEE Transactions on Systems, Man and Cybernetics, Vol. 6, 1976, pp.769-772.
[WB98]Westphal, C. and Blaxton, T., Data Mining Solutions, John Wiley & Sons, Inc., 1998.
[WHK98]Wei, C. P., Hu, P. J., and Kung, L. M., “Multiple-Level Clustering Analysis for Data Mining Applications,” Proceedings of 4th Informs Joint Conference on Information Systems and Technology, May, 1999.
[WPS01]Wei, C., Piramuthu, S. and Shaw, M. J., “Knowledge Discovery and Data Mining,” Chapter 41 in Handbook of Knowledge Management, Vol. 2, C. W. Holsapple (Ed.), Springer-Verlag, Berlin, Germany, 2003, pp.157-189.
[ZRL96]Zhang, T., Ramarkrishnan, R. and Livny, M., “BIRCH: An Efficient Data Clustering Method for Very Large Database,” Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Canada, 1996.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊
 
系統版面圖檔 系統版面圖檔