跳到主要內容

臺灣博碩士論文加值系統

(44.222.189.51) 您好!臺灣時間:2024/05/20 14:39
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:孫立洋
研究生(外文):Sun, Li-Yang
論文名稱:以重複樣本選擇降低噪音資料對文章分類的影響
論文名稱(外文):Using Repeated Instance Selection to Reduce the Impact of Noise Data on Text Classification
指導教授:陳耀輝陳耀輝引用關係
指導教授(外文):Chen, Yaw-Huei
學位類別:碩士
校院名稱:國立嘉義大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2021
畢業學年度:109
語文別:中文
論文頁數:58
中文關鍵詞:文章分類噪音資料強健訓練樣本選擇
外文關鍵詞:text classificationnoise datarobust traininginstance selection
相關次數:
  • 被引用被引用:0
  • 點閱點閱:115
  • 評分評分:
  • 下載下載:6
  • 收藏至我的研究室書目清單書目收藏:0
訓練一個深度學習的模型需要大量的訓練資料,但實際收集到的大量資料中不免有噪音資料,這些訓練資料中的噪音會嚴重影響模型分類的正確性。我們提出了使用多階段的樣本選擇方法來增強模型抵抗噪音訓練資料的能力,這個方法在每個階段都要挑選上階段資料標籤與預測標籤一致和不一致的兩種資料當成下階段的訓練資料。挑選標籤一致的資料可以留下大部分的乾淨資料,而挑選少數上階段標籤不一致的資料可以增加訓練資料的數量與多樣性。實驗結果顯示我們的方法在含有不同比例噪音的四個資料集中都可以增加模型的強健性,進而達到降低噪音訓練資料對模型影響的目的。
Training a deep learning model requires a large amount of training data, but the actual collected data may contain noisy data, and the noise in these training data can seriously affect the classification correctness of the model. We propose a multi-stage instance selection method to enhance the model's ability to resist noise training data. At each stage, we select instances whose labels are consistent with the predicted labels as well as a portion of instances whose labels are inconsistent with the predicted labels as the training data for the next stage. Picking the consistent labeled data preserves most of the clean data, while selecting a small portion of inconsistent labeled data from the previous stage can increase the number and diversity of training data. The experimental results show that our method can increase the robustness of the models with various noise ratios in four data sets, and thus reduce the influence of noisy training data on the models.
目錄
摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 viii
第一章 緒論 1
1.1. 研究背景 1
1.2. 研究目的 1
1.3. 研究貢獻 1
1.4. 論文架構 2
第二章 相關研究 3
2.1. 資料前處理 3
2.2. 強健訓練 (robust training) 5
第三章 研究方法 16
3.1. 方法概念 16
3.2. 方法流程 16
第四章 實驗 26
4.1. 資料集 26
4.2. 實驗環境 28
4.3. 實驗設計 28
4.3.1. 評量公式 40
4.3.2. 噪音加入方法 40
4.3.3. 訓練 41
4.3.4. 測試 44
4.4. 實驗結果 44
4.5. 實驗討論 51
第五章 結論與展望 54
參考文獻 56
[1] E. Arazo, D. Ortego, P. Albert, N. E. O’Connor, and K. McGuinness, “Unsupervised label noise modeling and loss correction,” in Proc. ICML, 2019.

[2] D. Arpit, S. Jastrzebski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fischer, A. Courville, Y. Bengio et al., “A closer look at memorization in deep networks,” in Proc. ICML, 2017, pp. 233–242.

[3] B. Biggio, B. Nelson, and P. Laskov, “Support vector machines under adversarial label noise,” in Proc. ACML, pp. 97-112, 2011.

[4] L. Breiman, "Arcing Classifier (with Discussion and a Rejoinder by the Author)," The Annals of Statistics, vol. 26, no. 3, pp. 801-849, 1998.

[5] L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, pp. 123-140, August, 1996.

[6] L. Breiman, "Stacked Regressions," Machine Learning, vol. 24, no. 2, pp. 49-64, July, 1996.

[7] H.-S. Chang, E. Learned-Miller, and A. McCallum, “Active Bias: Training more accurate neural networks by emphasizing high variance samples,” in Proc. NeurIPS, 2017, pp. 1002–1012.

[8] J. Chen, S. Sathe, C. Aggarwal, and D. Turaga, "Outlier Detection with Autoencoder Ensembles," in Proc. SIAM International Conference on Data Mining, pp. 90-98, 2017.

[9] Y. Ding, L. Wang, D. Fan, and B. Gong, “A semi-supervised two-stage approach to learning from noisy labels,” in Proc. WACV, 2018, pp. 1215–1224.

[10] A. Ganapathiraju and J. Picone, “Support vector machines for automatic data cleanup,” in Proc. ICSLP, 2000.

[11] A. Ghosh, H. Kumar, and P. Sastry, “Robust loss functions under label noise for deep neural networks,” in Proc. AAAI, 2017.

[12] J. Goldberger and E. Ben-Reuven, “Training deep neural-networks using a noise adaptation layer,” in Proc. ICLR, 2017.

[13] B. Han, J. Yao, G. Niu, M. Zhou, I. Tsang, Y. Zhang, and M. Sugiyama, “Masking: A new perspective of noisy supervision,” in Proc. NeurIPS, 2018, pp. 5836–5846.




[14] B. Han, Q. Yao, X. Yu, G. Niu, M. Xu, W. Hu, I. Tsang, and M. Sugiyama, “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” in Proc. NeurIPS, pp. 8527-8537, 2018.

[15] L. Jiang, Z. Zhou, T. Leung, L.-J. Li, and L. Fei-Fei, “MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels,” in Proc. ICML, 2018.

[16] I. Jindal, M. Nokleby, and X. Chen, “Learning deep networks from noisy labels with dropout regularization,” in Proc. ICDM, 2016, pp. 967–972.

[17] I. Jindal, D. Pressel, B. Lester, and M. Nokleby, “An Effective Label Noise Model for DNN text Classification,” in Proc. NAACL, 2019.

[18] A. J. Bekker and J. Goldberger, “Training deep neural-networks based on unreliable labels,” in Proc. ICASSP, 2016, pp. 2682–2686.

[19] R. Johnson, T. Zhang, “Effective Use of Word Order for Text Categorization with Convolutional Neural Networks,” in Proc. The 2015 Annual Conference of the North American Chapter of the ACL, pp. 103-112, 2015.

[20] D. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” in Proc. ICLR, 2014.

[21] Q. Le and T. Mikolov, "Distributed Representation of Sentences and Documents," in Proc. International Conference on Machine Learning, vol. 32, pp. 1188-1196, 2014.

[22] A.L. Maas, R.E. Daly, P.T. Pham, D. Huang, A.Y. Ng, and C. Potts, "Learning Word Vectors for Sentiment Anaysis," in Proc. ACL, vol. 1, pp. 142-150, 2011.

[23] E. Malach and S. Shalev-Shwartz, “Decoupling” when to update” from” how to update”,” in Proc. NeurIPS, 2017, pp. 960–970.

[24] C.D. Manning, P. Raghavan, and H. Schütze, Introduction to information retrieval, Cambridge University press, 2008.

[25] N. Manwani and P. Sastry, “Noise tolerance under risk minimization,” IEEE Transactions on Cybernetics, vol. 43, no. 3, pp. 1146-1151, 2013.

[26] V. Mnih and G. E. Hinton, “Learning to label aerial images from noisy data,” in Proc. ICML, pp. 567-574, 2012.

[27] G. Patrini, A. Rozza, A. Krishna Menon, R. Nock, and L. Qu, “Making deep neural networks robust to label noise: A loss correction approach,” in Proc. CVPR, 2017, pp. 1944–1952.



[28] J. Pennington, R. Socher, and C.D. Manning, "GloVe: Global Vectors for Word Representation," in Proc. Empirical Methods in Natural Language Processing (EMNLP), pp. 1532-1543, 2014.

[29] S. Reed, H. Lee, D. Anguelov, C. Szegedy, D. Erhan, and A. Rabinovich, “Training deep neural networks on noisy labels with bootstrapping,” in Proc. ICLR, 2015.

[30] D. T. Nguyen, C. K. Mummadi, T. P. N. Ngo, T. H. P. Nguyen, L. Beggel, and T. Brox, “Self: Learning to filter noisy labels with self-ensembling,” in Proc. ICLR, 2020.

[31] B. Van Rooyen, A, Menon, and B. Williamson, “Learning with symmetric label noise: The importance of being unhinged,” in Proc. NIPS, 2015.

[32] R. Wang, T. Liu, and D. Tao, “Multiclass learning with partially corrupted labels,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2568–2580, 2017.

[33] C. Wang, M. Zhang, S. Ma and L. Ru, “Automatic online news issue construction in web environment,” in Proc. World Wide Web, pp. 457–466, 2008.


[34] T. Xiao, T. Xia, Y. Yang, C. Huang, and X. Wang, “Learning from massive noisy labeled data for image classification,” in Proc. CVPR, pp. 2691-2699, 2015.

[35] Y. Yan, Z. Xu, I. W. Tsang, G. Long, and Y. Yang, “Robust semisupervised learning through label aggregation,” in Proc. AAAI, 2016.

[36] J. Yao, J. Wang, I. W. Tsang, Y. Zhang, J. Sun, C. Zhang, and R. Zhang, ”Deep learning from noisy image labels with quality embedding,” IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 1909-1922, 2018.

[37] X. Yu, B. Han, J. Yao, G. Niu, I. W. Tsang, and M. Sugiyama, “How does disagreement help generalization against label corruption?” in Proc. ICML, 2019.

[38] Z. Zhang and M. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” in Proc. NeurIPS, pp. 8778-8788, 2018.

[39] X. Zhang, J. Zhao and Y. LeCun, “Character-level convolutional networks for text classification”, in Proc. NeurIPS, pp. 649–657, 2015.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top