臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.188) 您好！臺灣時間：2025/10/07 19:38

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

陳奕名

研究生(外文):

Chen, Yih-Ming

論文名稱:

Borderline SMOTE 自適應提升決策樹

論文名稱(外文):

Borderline SMOTE adaptive boosted decision tree

指導教授:

王秀瑛

指導教授(外文):

Wang, Hsiu-Ying

口試委員:

黎正中、陳鄰安、鄭少為、王秀瑛

口試委員(外文):

Li, Cheng-Chong、Chen, Lin-An、Cheng, Shao-Wei、Wang, Hsiu-Ying

口試日期:

2016-01-15

學位類別:

碩士

校院名稱:

國立交通大學

系所名稱:

統計學研究所

學門:

數學及統計學門

學類:

統計學類

論文種類:

學術論文

論文出版年:

2016

畢業學年度:

104

語文別:

中文

論文頁數:

中文關鍵詞:

SMOTE、borderline SMOTE、SMOTE Boosting、不平衡樣本

外文關鍵詞:

SMOTE Boosting、borderline SMOTE、SMOTE Boosting、imbalanced dataset

相關次數:

被引用:4
點閱:793
評分:
下載:111
書目收藏:0

不平衡數據一直以來嚴重影響分類器進行分類的效能，許多的學者投入心力在此領域中。產生出許多的解決方法，由最簡易的抽樣方法(Sampling)、代價敏感學習方法(Cost-Sensitive)和聚合方法等等。在這些方法中經常有損失訊息或者是模型過度配適的情形，導致模型過度配適的原因為在有限的樣本點重複抽樣而分類器的分割邊界被限制，於是抽樣方法又由SMOTE延伸出大量的改良算法。而抽樣方法與代價敏感方法又被用於自適應提升框架(Adaptive Boosting)來增加分類器邊界的廣度。由於早期的SMOTE Boosting方法嵌入的SMOTE方法並沒有考慮數據的噪聲點，於是本文章將在SMOTE Boosting裡的SMOTE方法替換成更可靠的邊界人工過抽樣(Borderline SMOTE)來使得製造新的樣本點時可以考慮數據的分配並且排除噪聲點。最後我們實作Borderline SMOTE Boosting方法並且比較SMOTE Boosting方法以及Boosintg方法。

The problem of learning from imbalanced data has been receiving a growing attention. Since dealing with imbalanced data may decrease the efficiency of classifier, many researchers have been working on this domain and coming up with many solutions, such as the method of combining SMOTE(Synthetic Minority Over-sampling Technique) and decision tree. In this study, we review the existing methods including SMOTE, Borderline SMOTE, Adaptive Boosting and SMOTE Boosting. To improve these methods, we propose an approach Borderline SMOTE Boosting. This approach is compared with the existing methods using three real data examples. The results show that the proposed method leads to a better result.

目錄
中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
英文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
致謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iii
目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iv
表目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v
圖目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v
1 研究動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 文獻回顧 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 研究內容及方法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1 抽樣方法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1.1 SMOTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1.2 Borderline SMOTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
3.2 演算法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
3.2.1 決策樹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2.2 Adaptive Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
3.2.3 SMOTE Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
3.3 Borderline SMOTE Boosting decision tree. . . . . . . . . . . . . . . . . . . . . . .11
3.4 評估準則 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5 驗證與參數挑選 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5.1 交叉驗證(Cross Validation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5.2 網格搜尋(Grid Search) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 實際資料應用與模擬研究 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
4.1 數據集一 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 數據集二 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 數據集三 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 結論與未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
參考文獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 附錄一 SMOTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
附錄二 Borderline SMOTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
附錄三 Adaptive boosted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
附錄四 SMOTE Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
附錄五 Borderline SMOTE Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
表目錄
表 1 數據集一的模擬結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
表 2 數據集二的模擬結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
表 3 數據集三的模擬結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
表 4 三個數據集下的F1值及運算時間圖表 . . . . . . . . . . . . . . . . . . . . . . . . . .18

圖目錄
圖 1 決策樹範例 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
圖 2 混淆矩陣 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12

[1] Barua, S.; Lslam, M.; Yao, X.; Murase, K. (2014), “MWMOTE—Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning”, IEEE Transactions on Knowledge and Dara Eegineering, 26, pp. 405-425.
[2] Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. (1984). “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984, ISEN 0-534-98053-8.
[3] Bunkhumpornpat, C. ; Sinapiromsaran K. ; Lursinsap, C. (2009). “Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling Technique for Handling the Class Imbalanced Problem”. PAKDD, LNCS, 5476, pp. 475-482. Springer, Heidelberg (2009).
[4] Bunkhumpornpat, C. ; Sinapiromsaran, K.; Lursinsap, C. (2012). “DBSMOTE: Density-Based Synthetic Minority Over-sampling Technique”, Applied Intelligence, 36, pp. 664-684.
[5] Chawla, N. V. ; Bowyer, K. W. ; Hall L. O. ; Kegelmeyer W. P. (2002). “SMOTE: Synthetic Minority Over-sampling Technique”. Journal of Artificial Intelligence Research, 16, pp. 321–357.
[6] Chawla, N.V. ; Lazarevic, A. ; Hall, L. O. ; Bowyer, K. (2003). “SMOTEBoost: Improving Prediction of the Minority Class in Boosting”. Knowledge Discovery in Databases, pp. 107-119.
[7] Chawla, N. V.; Japkowicz, N.; Kolcz, A. (2004). “Editorial: Special Issue on Learning from Imbalanced Data Sets”, Sigkdd Explorations, 6, Issue 1 pp. 1-6.
[8] Fan, W.; Stolfo, S. J.; Zhang, J.; Chan, P. K. (1999). “AdaCost: Misclassification Cost-sensitive Boosting”, ICML, 99, pp. 97-105.
[9] Freund, Y.; Schapire, R. E. (1996). “Experiments with a New Boosting Algorithm”. Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148-156
[10] Han, H. ; Wang, W. Y. ; Mao B. H. (2005). “Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning”. ICIC, 3644, pp. 878-887. Springer Heidelberg.
[11] He, H.; Bai, Y., Garcia, E. A.; Li, S. (2008). “ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning”, IEEE World Congress on Computational Intelligence, pp. 1322-1328.
[12] Mustafa, G.; Niu, Z.; Yousif, A.; Tarus, J. (2015). “Solving the Class Imbalance Problems using RUSMultiBoost Ensemble”. Information Systems and Technologies (CISTI), 2015 10th Iberian Conference on, pp. 1-6.
[13] Ross, Q. J. (1993). “C4.5: Programsfor Machine Learning”, Machine Learning, 16. pp 235-240.
[14] Sáez, J. A.; Luengo, J.; Stefanowsk, J. ; Herrera, F. (2015). “SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering”, Information Sciences, 291, pp. 184–203.
[15] Seiffert, C.; Khoshgoftaar , T. M.; Hulse, J. V.; Napolitano, A. (2008).“Building Useful Models from Imbalanced Data with Sampling and Boosting”, Association for the Advancement of Artificial Intelligence, pp. 306-311.
[16] Sun, Z.; Song, Q. ; Zhu, X. ; Sun, H.; Xu, B.; Zhou, Y. (2015), “A novel ensemble method for classifying imbalanced data”, Pattern Recognition, 48, pp. 1623–1637.
[17] Weiss, G. M. (2004). “Mining with Rarity: A Unifying Framework”, Sigkdd Explorations 6, pp. 7-19.
[18] Weiss, G. M.; McCarthy, K.; Zabar, B. (2007). “Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs?”, DMIN, pp. 35-41.
[19] Yin, Q. Y. ; Zhang, J. S. ; Zhang, C. X. ; Liu S. C. (2013). “Research Article An Empirical Study on the Performance of Cost-Sensitive Boosting Algorithms with Different Levels of Class Imbalance”. Mathematical Problems in Engineering. Article ID 761814. 

電子全文

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

無相關論文

無相關期刊

1.	抽樣、權重、機率修正不平衡數據，並應用於決策樹分類
2.	決策樹結合複迴歸模型預測氣溫與雨量
3.	SMOTE 演算法和集成學習在不平衡資料預測之應用
4.	監控非常態製程的平均數與標準差之管制圖
5.	決策樹、羅吉斯迴歸與類神經網路預測員工績效之比較研究
6.	四甲基氫氧化銨於 CANON 系統中的降解及菌相的變化探討
7.	雙脈衝雷射共沉積碲化鎵/碲化銻側向組裝奈米異質結構於熱電轉換之應用
8.	新穎訊號自我拍擊干擾消除技術應用於無間隙式直接檢測MIMO正交分頻極化多工系統
9.	未成年人人工流產之醫療決定權─對我國優生保健法之反省
10.	以SIFT及模板匹配為基礎實現DIP式 Y電容外形自動檢測之研究
11.	漸開線插槽刮齒刀之刃部剛性研究
12.	探討在人體外授精過程中胚胎早期分裂用不同卵巢刺激策略之關聯與濾泡液中基質金屬蛋白酶的活性變化
13.	顧客消費行為分析及行動銀行使用預測-決策樹、隨機森林與判別分析之比較
14.	半導體先進製程的布林邏輯演算應用
15.	織造電器—自造工藝應用於電器產品的設計

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室