跳到主要內容

臺灣博碩士論文加值系統

(44.221.73.157) 您好!臺灣時間:2024/06/20 21:46
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:高偉城
研究生(外文):Kao, Wei-Cheng
論文名稱:基於專家學習之集成知識蒸餾
論文名稱(外文):Multi-Teacher Ensemble Distillation via Specific Expert Learning
指導教授:林智揚林智揚引用關係
指導教授(外文):Lin, Chih-Yang
口試委員:林智揚李建誠蘇柏齊
口試委員(外文):Lin, Chih-YangLee, Chien-ChengSu, Po-Chyi
口試日期:2020-11-04
學位類別:碩士
校院名稱:元智大學
系所名稱:電機工程學系乙組
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:109
語文別:英文
論文頁數:48
中文關鍵詞:知識蒸餾
外文關鍵詞:knowledge distillationensemblemodel compression
相關次數:
  • 被引用被引用:0
  • 點閱點閱:175
  • 評分評分:
  • 下載下載:5
  • 收藏至我的研究室書目清單書目收藏:0
近年來 深度類神經網路模型開始被重視 深度學習日漸地出現在大家的視野。至今不管是在電腦視覺領域或是語音領域,深度模型網路在這些實際應用之中都展現了相當好的表現 也因此導致了深度學習在這些領域中的無處不在。隨著新類神經網路的相繼出現,準確率以飛快的速度成長深度學習的應用越來越廣泛。然而,這些網路追求更高的準確率時 網路層數已超過100層且模型參數動輒幾百萬 模型設計上皆傾向於增加更深的深度以及更高的複雜度。模型壓縮變成一個極具實際性的研究領域,知識蒸餾是模型壓縮的其中一項技術,它可以再不提升任何參數的情況下,讓小模型擁有和大模型差不多的準確率。我們提出兩階段的知識蒸餾訓練策略。在第一階段之中,包含了專家學習。專家學習不只能增加分類器總體的準確率,也可以增強網路特定類別的準確率。在第二階段中 我們將被專家學習訓練過的分類器集合成一個集成,並將集成的結果蒸餾到另一個網路身上 在未提升任何運算量的情況下,可以更進一步提高網路的準確率 。
In recent years, deep learning has shown outstanding performance on visual tasks, i.e., image classification. However, the classifier usually does not have the same accuracy in all of the classes and it would be weak in some specific classes. In this paper, we propose a two-stage knowledge distillation strategy. The first step includes a novel knowledge distillation method, Specific Expert Learning (SEL), which can improve the performance of the classifier on specific classes and overall accuracy through knowledge distillation. Then, we compose the models trained by SEL into an ensemble with high diversity. The second step distills the ensemble into a single student net. The experimental result demonstrates that Resnet32 achieves 93.34% accuracy by using SEL and the ensemble which comprises several models trained by SEL obtains 94.47% accuracy on CIFAR-10. This result indicates that our ensemble can retain diversity even if we use the same architecture. After distilling ensemble into a student net, ResNet-32 can achieve 93.56% accuracy on CIFAR-10, which is 1.14% higher than the baseline. In conclusion, our experiment demonstrates that our proposed SEL not only improves the accuracy of the classifier but also enhance the diversity of the model in ensemble.
中文摘要 ii
ABSTRACT iii
Contents iv
List of Figures vi
1 Introduction 1
2 Related work 7
2.1 Model compression approach 7
2.2 Knowledge distillation 9
2.3 Multiple teacher distillation 15
2.4 Ensemble knowledge distillation 17
3 Method 21
3.1 SEL 21
3.2 Ensemble 24
Stage 1 25
Stage 2 26
4 Experiment 28
4.1 Implement Details 28
4.2 Framework 28
4.3 Dataset 30
4.4 Training Details 30
4.5 SEL evaluation 32
4.6 Training cost evaluation 35
4.7 Classes of expert net 36
4.8 Ensemble evaluation 39
4.9 Two-stage strategy evaluation 41
5 Conclusion 42
6 Reference 43

[1]M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, and M. Isard, “Tensorflow: A system for large-scale machine learning,” [Online].Available:https://arxiv.org/abs/1605.08695, pp. 265-283, 2016.
[2]D. Chen, J.-P. Mei, C. Wang, Y. Feng, and C. Chen, “Online Knowledge Distillation with Diverse Peers,” AAAI, pp. 3430-3437, 2020.
[3]F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251-1258, 2017.
[4]R. Collobert, K. Kavukcuoglu, and C. Farabet, “Torch7: A matlab-like environment for machine learning,” BigLearn, NIPS workshop, pp., 2011.
[5]E. J. Crowley, G. Gray, and A. J. Storkey, “Moonshine: Distilling with cheap convolutions,” Advances in Neural Information Processing Systems, pp. 2888-2898, 2018.
[6]J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4690-4699, 2019.
[7]X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley, “Clearing the skies: A deep network architecture for single-image rain removal,” IEEE Transactions on Image Processing, vol. 26, no. 6, pp. 2944-2956, 2017.
[8]T. Furlanello, Z. C. Lipton, M. Tschannen, L. Itti, and A. Anandkumar, “Born-Again Neural Networks,” in International Conference on Machine Learning (ICML), 2018.
[9]Q. Guo, X. Wang, Y. Wu, Z. Yu, D. Liang, X. Hu, and P. Luo, “Online Knowledge Distillation via Collaborative Learning,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11020-11029, 2020.
[10]K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Venice, pp. 2980--2988, 2017.
[11]K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
[12]G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” [Online].Available: https://arxiv.org/abs/1503.02531, no., 2015.
[13]A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, and V. Vasudevan, “Searching for mobilenetv3,” Proceedings of the IEEE International Conference on Computer Vision, pp. 1314-1324, 2019.
[14]A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” [Online].Available: https://arxiv.org/abs/1704.04861, no., 2017.
[15]G. Huang, D. Chen, T. Li, F. Wu, L. van der Maaten, and K. Q. Weinberger, “Multi-scale dense networks for resource efficient image classification,” [Online].Available: https://arxiv.org/abs/1703.09844, no., 2017.
[16]G. Huang, S. Liu, L. Van der Maaten, and K. Q. Weinberger, “Condensenet: An efficient densenet using learned group convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752-2761, 2018.
[17]G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700-4708, 2017.
[18]Z. Huang and N. Wang, “Like what you like: Knowledge distill via neuron selectivity transfer,” [Online].Available: https://arxiv.org/abs/1707.01219, no., 2017.
[19]T. Issenhuth, J. Mary, and C. Calauzènes, “Do Not Mask What You Do Not Need to Mask: a Parser-Free Virtual Try-On,” [Online].Available: https://arxiv.org/abs/2003.10120, no., 2020.
[20]J. Kim, S. Park, and N. Kwak, “Paraphrasing complex network: Network compression via factor transfer,” Advances in Neural Information Processing Systems, pp. 2760-2769, 2018.
[21]A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 and cifar-100 datasets,” URl: https://www. cs. toronto. edu/kriz/cifar. html, vol. 6, no., 2009.
[22]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, pp. 1097-1105, 2012.
[23]Y. Li, J. Yang, Y. Song, L. Cao, J. Luo, and L.-J. Li, “Learning from noisy labels with distillation,” Proceedings of the IEEE International Conference on Computer Vision, pp. 1910-1918, 2017.
[24]L. Liu, J. Chen, H. Wu, T. Chen, G. Li, and L. Lin, “Efficient Crowd Counting via Structured Knowledge Transfer,” [Online].Available: https://arxiv.org/abs/2003.10120, no., 2020.
[25]R. G. Lopes, S. Fenu, and T. Starner, “Data-free knowledge distillation for deep neural networks,” arXiv preprint arXiv:1710.07535, no., 2017.
[26]N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” Proceedings of the European Conference on Computer Vision, pp. 116-131, 2018.
[27]L. v. d. Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of machine learning research, vol. 9, no. Nov, pp. 2579-2605, 2008.
[28]P. Micaelli and A. J. Storkey, “Zero-shot knowledge transfer via adversarial belief matching,” Advances in Neural Information Processing Systems, pp. 9551-9561, 2019.
[29]S. I. Mirzadeh, M. Farajtabar, A. Li, N. Levine, A. Matsukawa, and H. Ghasemzadeh, “Improved knowledge distillation via teacher assistant,” Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5191-5198, 2020.
[30]Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” no., 2011.
[31]B. Peng, X. Jin, J. Liu, D. Li, Y. Wu, Y. Liu, S. Zhou, and Z. Zhang, “Correlation congruence for knowledge distillation,” Proceedings of the IEEE International Conference on Computer Vision, pp. 5007-5016, 2019.
[32]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016.
[33]A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, “Fitnets: Hints for thin deep nets,” [Online].Available: https://arxiv.org/abs/1412.6550, no., 2014.
[34]O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” International Conference on Medical image computing and computer-assisted intervention, pp. 234-241, 2015.
[35]O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, no. 3, pp. 211-252, 2015.
[36]M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018.
[37]F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A Unified Embedding for Face Recognition and Clustering,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815-823, 2015.
[38]Z. Shen, Z. He, and X. Xue, “Meal: Multi-model ensemble via adversarial learning,” Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4886-4893, 2019.
[39]K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” [Online].Available: https://arxiv.org/abs/1409.1556, no., 2014.
[40]C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
[41]C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818-2826, 2016.
[42]M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le, “Mnasnet: Platform-aware neural architecture search for mobile,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820-2828, 2019.
[43]L. Tran, B. S. Veeling, K. Roth, J. Swiatkowski, J. V. Dillon, J. Snoek, S. Mandt, T. Salimans, S. Nowozin, and R. Jenatton, “Hydra: Preserving ensemble diversity for model distillation,” [Online].Available: https://arxiv.org/abs/1708.04106, no., 2020.
[44]F. Tung and G. Mori, “Similarity-preserving knowledge distillation,” Proceedings of the IEEE International Conference on Computer Vision, pp. 1365-1374, 2019.
[45]X. Xie, Y. Zhou, and S.-Y. Kung, “HGC: Hierarchical Group Convolution for Highly Efficient Neural Network,” [Online].Available: https://arxiv.org/abs/1906.03657, no., 2019.
[46]G. Xu, Z. Liu, X. Li, and C. C. Loy, “Knowledge Distillation Meets Self-Supervision,” [Online].Available: https://arxiv.org/abs/2006.07114, no., 2020.
[47]S. You, C. Xu, C. Xu, and D. Tao, “Learning from multiple teacher networks,” Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1285-1294, 2017.
[48]R. Yu, A. Li, V. I. Morariu, and L. S. Davis, “Visual relationship detection with internal and external linguistic knowledge distillation,” Proceedings of the IEEE International Conference on Computer Vision, pp. 1974-1982, 2017.
[49]S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer,” [Online].Available: https://arxiv.org/abs/1612.03928, no., 2016.
[50]X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848-6856, 2018.
[51]Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu, “Deep mutual learning,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4320-4328, 2018.
[52]X. Zhu and S. Gong, “Knowledge distillation by on-the-fly native ensemble,” Advances in Neural Information Processing Systems, pp. 7517-7527, 2018.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊