跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.213) 您好!臺灣時間:2025/11/08 15:32
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張乃楠
研究生(外文):Nai-Nan ZHANG
論文名稱:基於混合方法探究不平衡數據分類問題
論文名稱(外文):Research on Unbalanced Data Classification Based on Hybrid Method
指導教授:簡廷因
指導教授(外文):Ting Ying Chien
口試委員:黃柏鈞張經略
口試委員(外文):Po-Chun HuangChing-Lueh Chang
口試日期:2018-06-25
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:107
語文別:中文
論文頁數:52
中文關鍵詞:不平衡數據數據層方法演算法層方法混合方法
外文關鍵詞:Unbalanced DataData-level methodAlgorithm-level methodHybrid Method
相關次數:
  • 被引用被引用:3
  • 點閱點閱:577
  • 評分評分:
  • 下載下載:66
  • 收藏至我的研究室書目清單書目收藏:0
不平衡數據在真實世界的數據集中無處不在。在本文中,我們研究了二分類的非平衡數據分佈。在二分類不平衡數據分佈中,多數類實例的數量顯著大於少數類實例的數量。傳統的機器學習演算法以經驗風險係數最小化為目標,雖然總體的正確率很高,但是由於少數類的數量相對稀少,因此往往會犧牲少數群體的分類精度。然而,少數類常常是人們感興趣的。為了改善不平衡數據分類的問題,研究者提出了數據層方法,如過採樣(over-sampling)和欠採樣(under-sampling),以及演算法層方法,如集成學習、代價敏感學習和單類學習。基於這些方法,我們提出了一種混合的處理非平衡數據問題的方法,混合方法包括數據預處理、對數據層次聚類、數據再平衡、模型構建、分類器的集成五個步驟。
Unbalanced data are ubiquitous in real-world datasets. In this paper, we investigate unbalanced data distribution for binary classification, i.e., where the number of majority class instances is significantly greater than the number of minority class instances. It is assumed that traditional machine learning algorithms attempt to minimize empirical risk factors, and, as a result, the classification accuracy of the minority is often sacrificed. However, people are often interested in the minority. Various data-level methods, such as over- and under-sampling, and algorithm-level methods, such as ensemble, cost-sensitive, and one-class learning, have been proposed to improve classifier performance with an unbalanced data distribution. Based on such methods, we proposed a hybrid approach to deal with unbalanced data problem that comprises data preprocessing, clustering, data balancing, model building, and ensemble.
目錄
摘要 I
ABSTRACT II
目錄 III
誌 謝 V
表目錄 VI
圖目錄 VI
第一章 緒論 1
1.1研究背景與動機 1
1.2研究目的 2
1.3創新點 3
1.4 論文框架 4
第二章、文獻回顧 5
2.1數據層次的方法 5
2.1.1隨機採樣法 5
2.1.2 SMOTE過採樣 5
2.1.3 Borderline-SMOTE過採樣 7
2.1.4 SMOTETomek方法 8
2.1.5 其它的數據再平衡方法 9
2.2演算法層次的方法 9
2.2.1集成學習 9
2.2.2代價敏感學習 10
2.3 SVM(SUPPORT VECTOR MACHINE) 11
2.3.1硬間隔支持向量機 11
2.3.2軟間隔支持向量機 13
2.3.3非線性 14
第三章 研究方法 15
3.1演算法框架 15
3.2數據預處理技術 16
3.3 層次聚類(HIERARCHICAL CLUSTERING) 17
3.4數據的再平衡處理 19
3.5 模型構建 20
3.6 集成學習(ENSEMBLE LEARNING) 20
3.7 評價指標 21
第四章 研究結果與分析 23
4.1數據信息 23
4.2實驗結果與分析 24
第五章 結論 48
參考文獻 50
參考文獻
[1] C.-M. Vong, W.-F. Ip, C.-C. Chiu, and P.-K. Wong, "Imbalanced learning for air pollution by meta-cognitive online sequential extreme learning machine," Cognitive Computation, vol. 7, no. 3, pp. 381-391, 2015.
[2] H. Parvin, B. Minaei-Bidgoli, and H. Alinejad-Rokny, "A new imbalanced learning and dictions tree method for breast cancer diagnosis," Journal of Bionanoscience, vol. 7, no. 6, pp. 673-678, 2013.
[3] M. Zhou, L. O. Hall, D. B. Goldgof, R. J. Gillies, and R. A. Gatenby, "Imbalanced learning for clinical survival group prediction of brain tumor patients," in Medical Imaging 2015: Computer-Aided Diagnosis, 2015, vol. 9414, p. 94142K: International Society for Optics and Photonics.
[4] J. Tesic, A. Natsev, L. Xie, and J. R. Smith, "Data modeling strategies for imbalanced learning in visual search," in Multimedia and Expo, 2007 IEEE International Conference on, 2007, pp. 1990-1993: IEEE.
[5] L. Piras and G. Giacinto, "Synthetic pattern generation for imbalanced learning in image retrieval," Pattern Recognition Letters, vol. 33, no. 16, pp. 2198-2205, 2012.
[6] R. Batuwita and V. Palade, "Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning," Journal of Bioinformatics and Computational Biology, vol. 10, no. 04, p. 1250003, 2012.
[7] 李航, "统计学习方法," 清华大学出版社, 北京, 2012.
[8] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, "Support vector machines," IEEE Intelligent Systems and their applications, vol. 13, no. 4, pp. 18-28, 1998.
[9] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, no. 1, pp. 81-106, 1986.
[10] L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5-32, 2001.
[11] E. Y. Chang, "Imbalanced Data Learning," in Foundations of Large-Scale Multimedia Information Management and Retrieval: Springer, 2011, pp. 191-211.
[12] 李勇, 刘战东, and 张海军, "不平衡数据的集成分类算法综述," 2014.
[13] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, "A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches," IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 4, pp. 463-484, 2012.
[14] T. Jo and N. Japkowicz, "Class imbalances versus small disjuncts," ACM Sigkdd Explorations Newsletter, vol. 6, no. 1, pp. 40-49, 2004.
[15] B. Krawczyk, "Learning from imbalanced data: open challenges and future directions," Progress in Artificial Intelligence, vol. 5, no. 4, pp. 221-232, 2016.
[16] W. H. Day and H. Edelsbrunner, "Efficient algorithms for agglomerative hierarchical clustering methods," Journal of classification, vol. 1, no. 1, pp. 7-24, 1984.
[17] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of artificial intelligence research, vol. 16, pp. 321-357, 2002.
[18] G. E. Batista, R. C. Prati, and M. C. Monard, "A study of the behavior of several methods for balancing machine learning training data," ACM SIGKDD explorations newsletter, vol. 6, no. 1, pp. 20-29, 2004.
[19] L. Breiman, "Bagging predictors," Machine learning, vol. 24, no. 2, pp. 123-140, 1996.
[20] H. He and E. A. Garcia, "Learning from imbalanced data," IEEE Transactions on knowledge and data engineering, vol. 21, no. 9, pp. 1263-1284, 2009.
[21] Z. Sun, Q. Song, X. Zhu, H. Sun, B. Xu, and Y. Zhou, "A novel ensemble method for classifying imbalanced data," Pattern Recognition, vol. 48, no. 5, pp. 1623-1637, 2015.
[22] H. Han, W.-Y. Wang, and B.-H. Mao, "Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning," in International Conference on Intelligent Computing, 2005, pp. 878-887: Springer.
[23] H. He, Y. Bai, E. A. Garcia, and S. Li, "ADASYN: Adaptive synthetic sampling approach for imbalanced learning," in Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, 2008, pp. 1322-1328: IEEE.
[24] 李轩, "基于少数类样本重组的不平衡数据分类研究," 湖南大学, 2016.
[25] X. Y. Liu and Z. H. Zhou, "Ensemble methods for class imbalance learning," Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 61-82, 2013.
[26] X.-Y. Liu, J. Wu, and Z.-H. Zhou, "Exploratory undersampling for class-imbalance learning," IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539-550, 2009.
[27] G. Rätsch, T. Onoda, and K.-R. Müller, "Soft margins for AdaBoost," Machine learning, vol. 42, no. 3, pp. 287-320, 2001.
[28] 周波, "基于集成学习的不平衡数据分类的研究及应用," 大连理工大学, 2014.
[29] 宋海燕, "基于集成学习的不平衡数据分类," 西安电子科技大学, 2014.
[30] 王韬, "基于上抽样和集成学习的不平衡数据分类研究," 厦门大学, 2017.
[31] K. Veropoulos, C. Campbell, and N. Cristianini, "Controlling the sensitivity of support vector machines," in Proceedings of the international joint conference on AI, 1999, vol. 55, p. 60.
[32] S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, and R. Togneri, "Cost-Sensitive learning of deep feature representations from imbalanced data," IEEE transactions on neural networks and learning systems, 2017.
[33] 丁世飞, 齐丙娟, and 谭红艳, "支持向量机理论与算法研究综述," 2011.
[34] 周志华, 机器学习. Qing hua da xue chu ban she, 2016.
[35] 郭丽娟, 孙世宇, and 段修生, "支持向量机及核函数研究," 科学技术与工程, vol. 8, no. 2, pp. 487-490, 2008.
[36] 奉国和, "SVM 分类核函数及参数选择比较," 计算机工程与应用, vol. 47, no. 3, pp. 123-124, 2011.
[37] S. García, J. Luengo, and F. Herrera, Data preprocessing in data mining. Springer, 2016.
[38] S. Patro and K. K. Sahu, "Normalization: A preprocessing stage," arXiv preprint arXiv:1503.06462, 2015.
[39] 段明秀, "层次聚类算法的研究及应用," 万方数据资源系统, 2009.
[40] D. Rodriguez, I. Herraiz, R. Harrison, J. Dolado, and J. C. Riquelme, "Preliminary comparison of techniques for dealing with imbalance in software defect prediction," in Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, 2014, p. 43: ACM.
[41] A. Asuncion and D. Newman, "UCI machine learning repository," ed, 2007.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top