跳到主要內容

臺灣博碩士論文加值系統

(44.192.22.242) 您好!臺灣時間:2021/07/31 11:16
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張育誠
研究生(外文):ZHANG, YU-CHENG
論文名稱:基於卷積類神經網路之硬體感知壓縮技巧來實現輕量化深度學習
論文名稱(外文):Hardware-aware Compression Scheme on Convolutional Neural Networks to Realize Light-weight Deep Learning
指導教授:陳自強陳自強引用關係
指導教授(外文):CHEN, TZU-CHIANG
口試委員:陳自強吳國瑞余松年江瑞秋
口試委員(外文):CHEN, TZU-CHIANGWU,GUO-ZUAYU, SUNG-NIENCHIANG, JUI-CHIU
口試日期:2020-07-28
學位類別:碩士
校院名稱:國立中正大學
系所名稱:電機工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:66
中文關鍵詞:輕量化模型邊緣運算類神經網路架構搜索一站式類神經網路架構搜索極限梯度提升決策樹軟濾波剪枝混合深度卷積核
外文關鍵詞:light-weight modelsedge computingNeural Architecture SearchOne-Shot NASXGBoostSoft filter pruningMixed Depthwise Convolutional Kernels
相關次數:
  • 被引用被引用:0
  • 點閱點閱:224
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
輕量化模型是深度學習的趨勢,而目前很多輕量化模型都是用類神經網路架構搜索(Neural Architecture Search, NAS)以及剪枝(pruning)的方式來實現。在NAS方面,強化學習NAS(Reinforcement Learning NAS)雖然能夠針對模型的通道數(寬度)、層數(深度),以及各種不同卷積的運算方式,作為搜索空間進行搜索,但是每搜索一次,整個網路架構就要重新訓練,十分耗時。而一站式NAS(One-Shot NAS)雖然可以透過權重分享(weight sharing)的方式,使得網路架構不用重新訓練,但是對於模型的通道數、層數皆無法進行搜索,只能針對各種不同卷積的運算方式進行搜索。然而通道數與層數才是攸關參數量(parameter)以及運算量(FLoating point Operations Per Second, FLOPS)等輕量化模型指標之重要因素。

本論文提出了三種深度學習壓縮的方法,第一種方法改善搜索時間比原本強化學習NAS快之One-Shot NAS方法,使其能搜索通道數,並應用於混合深度卷積核(MixConv)上。第二種方法是配合第一種方法,使用極限梯度提升決策樹回歸(eXtreme Gradient Boosting Decision Tree Regression, XGBoost)搭建出硬體感知模型,隨後根據搜索結果與實際在各平台上跑的數據,來預測出更多符合硬體平台延遲的類神經網路架構,並針對應用之資料集進行重新訓練。第三種方法為新的通道稀疏化方法,將批正規化(Batch Normalization)的γ參數之後加上Tanhshrink函數,使其更容易剪枝,並以此方法決定混合深度卷積核中每種卷積核之比例,最後再刪減混合深度卷積核上不必要的通道,使得模型更加的輕量化。

Light-weight models are a growing trend of deep learning, many light-weight models are implemented by Neural Architecture Search (NAS) and pruning method. In terms of NAS, Reinforcement Learning NAS(RL-NAS) can search for the models’ channels(width), layers(depth), and various different convolutional calculation methods as a search space, However, It is very time-consuming because each search requires the entire network architecture to retrain. Although One-Shot NAS leaves out the step of retraining for the network architecture by using weight-sharing method, it can only search for various convolutional calculation methods without the number of channels and layer of the model. However, number of channels and layers are the key factors effecting number of weight parameters and FLoating point Operations Per Second(FLOPS), which are the indicators of a light-weight model.

This paper proposes three deep learning compression methods. First, improve One-Shot NAS, which is originally faster than RL-NAS. In this way, it solves the problem of the number of search channels when apply to the Mixed Depth-wise Convolutional Kernels (MixConv). Second, use eXtreme Gradient Boosting Decision Tree Regression (XGBoost) to build hardware-aware model, we can predict more neural network architecture according to the latency of hardware device and use this architecture for application dataset retraining. Last, this paper proposes a new channel-level sparsity, which adds the γ parameter of Batch Normalization in Tanhshrink function to make it easier to prune, this method also determines the ratio of each kernel in MixConv, and finally, prune the unnecessary channels to make the model more light-weight.
致謝詞 i
中文摘要 iii
Abstract iv
目錄 v
圖目錄 viii
表目錄 ix
1 第一章 緒論 1
1.1 前言 1
1.2 研究動機與目的 1
1.3 論文架構 1
2 第二章 研究背景 2
2.1 背景 2
2.2 輕量化模型 3
2.2.1 模型連接方式 3
2.2.2 卷積運算方式 7
2.2.3 卷積方塊組合方式 11
2.3 類神經網路架構搜索方法 17
2.3.1 強化學習NAS 17
2.3.2 Differentiable-NAS(DARTS) 19
2.3.3 One-Shot NAS 19
2.4 剪枝 22
2.4.1 Batch Normalization 22
2.4.2 L1與L2正規化 23
2.4.3 Hard Pruning and Soft Pruning 24
2.4.4 Weight Pruning and Channel Pruning 25
2.4.5 AtomNAS 25
2.4.6 多目標剪枝 27
3 第三章 研究方法 28
3.1 可搜索通道數之One-Shot NAS 28
3.1.1 搜索空間 29
3.1.2 超網(supernet)設計 31
3.1.3 硬體感知模型 34
3.1.4 搜索方法與流程 35
3.2 剪枝 37
3.2.1 模型稀疏化 37
3.2.2 剪枝流程 38
4 第四章 實驗結果 39
4.1 實驗環境與步驟 39
4.1.1 實驗環境與資料集 39
4.1.2 實驗步驟—搜索與訓練步驟 41
4.1.3 實驗步驟—軟濾波剪枝 41
4.1.4 實驗步驟—參數設置 42
4.2 類神經網路搜索結果 43
4.2.1 搜索出來的各個模型 43
4.2.2 硬體感知結果 46
4.3 搜索出來的好模型 47
4.3.1 搜索選項 47
4.3.2 Muti-Shot NAS_A 48
4.3.3 Muti-Shot NAS_B 49
4.3.4 Muti-Shot NAS_C 50
4.3.5 Muti-Shot NAS_D 51
4.3.6 Muti-Shot NAS_E 52
4.3.7 Muti-Shot NAS_F 53
4.3.8 Muti-Shot NAS_G 54
4.3.9 Muti-Shot NAS_H 55
4.3.10 Muti-Shot NAS_I 56
4.4 搜索出之架構之準確率與原架構之比較 57
5 第五章 結論與未來展望 59
6 參考資料 60

[1]B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In ICLR, 2017.
[2]Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: 2017 IEEE International Conference on Computer Vision (ICCV). pp. 2755–2763. IEEE (2017).
https://github.com/Eric-mingjie/network-slimSming
[3]A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[4]V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
[5]G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
[6]K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
[7]C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
[8]K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[9]G. Huang, Z. Liu, K. Q. Weinberger, and L. Maaten. Densely connected convolutional networks. In CVPR, 2017.
[10]J. Hu, L. Shen and G. Sun, "Squeeze-and-Excitation Networks," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 7132-7141, doi: 10.1109/CVPR.2018.00745.
[11]Wang, W., Li, X., Yang, J., & Lu, T. (2018). Mixed Link Networks. ArXiv, abs/1802.01808.
[12]Williams, S., Waterman, A. and Patterson, D., 2009. Roofline: an insightful visual performance model for multicore architectures. CACM, 52(4)
[13]F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 1mb model size. arXiv preprint arXiv:1602.07360, 2016.
[14]Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.
[15]Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. mobile networks for classification, detection and segmentation. CoRR, abs/1801.04381, 2018.
[16]A. Howard et al., "Searching for MobileNetV3," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 1314-1324, doi: 10.1109/ICCV.2019.00140.
[17]X. Zhang, X. Zhou, M. Lin, and J. Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. arXiv:1707.01083, 2017.
[18]Mingxing Tan and Quoc V. Le. Mixconv: Mixed depthwise convolutional kernels. CoRR, abs/1907.09595, 2019b.
[19]Yu, Fisher and Koltun, Vladlen. Multi-scale context aggregation by dilated convolutions. In ICLR, 2016. URL http://arxiv.org/abs/1511.07122.
[20]S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
[21]Ma, N., Zhang, X., Zheng, H.-T., and Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. ECCV, 2018
[22]Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, and Quoc V. Le. Mnasnet: Platform-aware neural architecture search for mobile. CoRR, abs/1807.11626, 2018.
[23]Tien-Ju Yang, Andrew G. Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, and Hartwig Adam. Netadapt: Platform-aware neural network adaptation for mobile applications. In ECCV, 2018.
[24]Ramachandran, P., Zoph, B., and Le, Q. V. Searching for activation functions. arXiv preprint arXiv:1710.05941, 2018.
[25]B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. arXiv: 1707.07012, 2017.
[26]Antoine Yang, Pedro M Esperanc¸a, and Fabio M Carlucci. Nas evaluation is frustratingly hard. International Conference on Learning Representations (ICLR), 2020.
[27]H. Liu, K. Simonyan, and Y. Yang. DARTS: Differentiable architecture search. ICLR, 2019.
[28]Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Understanding and simplifying one-shot architecture search. In ICML, 2018.
[29]Mingxing Tan and Quoc V Le. EfficientNet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, 2019.
[30]Guo, Z., Zhang, X., Mu, H., Heng, W., Liu, Z., Wei, Y., Sun, J.: Single Path One-Shot Neural Architecture Search with Uniform Sampling. arXiv preprint. arXiv:1904.00420 (2019)
[31]Chu, X., Zhang, B., Xu, R., Li, J.: FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search. arXiv preprint. arXiv:1907.01845 (2019)
[32]Chu, Xiangxiang and Li, Xudong and Lu, Yi and Zhang, Bo and Li, Jixiang : MixPath: A Unified Approach for One-shot Neural Architecture Search. arXiv preprint arXiv:2001.05887(2020)
https://github.com/xiaomi-automl/MixPath
[33]S.J. Hanson, L.Y. Pratt Comparing biases for minimal network construction with back-propagation D.S. Touretzky (Ed.), Advances in neural information processing systems (NIPS), vol. 1, Morgan Kaufmann, San Mateo, CA (1989), pp. 177-185
[34]M. Schmidt, G. Fung, and R. Rosales. Fast optimization methods for l1 regularization: A comparative study and two new approaches. In ECML, pages 286–297, 2007.
[35]Corinna Cortes, Mehryar Mohri, and Afshin Rostamizadeh. L2 regularization for learning kernels. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, 2009.
[36]Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell, “Rethinking the value of network pruning,” in ICLR, 2019.
[37]Yiwen Guo, Anbang Yao, and Yurong Chen. Dynamic network surgery for efficient DNNs. In NIPS, 2016.
[38]He, Y., Dong, X., Kang, G., Fu, Y., Yan, C., & Yang, Y. (2020). Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks. IEEE Transactions on Cybernetics, 50, 3594-3604.
https://github.com/he-y/soft-filter-pruning
[39]Mei, J., Li, Y., Lian, X., Jin, X., Yang, L., Yuille, A., Jianchao, Y.: AtomNAS: Fine-Grained End-to-End Neural Architecture Search. ICLR (2020)
[40]J. Yu, L. Yang, N. Xu, J. Yang, and T. Huang, “Slimmable neural networks,” arXiv preprint arXiv:1812.08928, 2018.
[41]J. Yu and T. Huang, “Universally slimmable networks and improved training techniques,” arXiv preprint arXiv:1903.05134, 2019.
[42]Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
[43]Jiahui Yu and Thomas Huang. AutoSlim: Towards One-Shot Architecture Search for Channel Numbers. arXiv e-prints, page arXiv:1903.11728, Mar 2019.
[44]Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. SMASH: one-shot model architecture search through hypernetworks. In International Conference on Learning Representations (ICLR), 2018
[45]Ross Wightman. PyTorch Image Models.
https://github.com/rwightman/pytorch-image-models
[46]Gardner, Matt, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson H S Liu, Matthew E. Peters, Michael Schmitz and Luke Zettlemoyer. “AllenNLP: A Deep Semantic Natural Language Processing Platform.” arXiv preprint arXiv:1803.07640 (2017).
https://github.com/allenai/allennlp/tree/master/allennlp/training/learning_rate_schedulers
[47]Ott, M., Edunov, S., Baevski, A., Fan, A., Gross, S., Ng, N., Grangier, D., & Auli, M. (2019). fairseq: A Fast, Extensible Toolkit for Sequence Modeling. ArXiv, abs/1904.01038.
https://github.com/pytorch/fairseq/tree/master/fairseq/optim/lr_scheduler
[48]Zhong, Zhun, Liang Zheng, Guoliang Kang, Shaozi Li and Yi Yang. “Random Erasing Data Augmentation.” AAAI (2020).
[49]Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., & Han, J. (2020). On the Variance of the Adaptive Learning Rate and Beyond. ArXiv, abs/1908.03265.
[50]Ginsburg, B., Castonguay, P., Hrinchuk, O., Kuchaiev, O., Lavrukhin, V., Leary, R., Li, J., Nguyen, H., & Cohen, J.M. (2019). Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks. ArXiv, abs/1905.11286.
[51]Zhang, M.R., Lucas, J., Hinton, G.E., & Ba, J. (2019). Lookahead Optimizer: k steps forward, 1 step back. NeurIPS.
[52]Zhang, H., Cissé, M., Dauphin, Y., & Lopez-Paz, D. (2018). mixup: Beyond Empirical Risk Minimization. ArXiv, abs/1710.09412.
[53]Cubuk, E.D., Zoph, B., Mané, D., Vasudevan, V., & Le, Q.V. (2018). AutoAugment: Learning Augmentation Policies from Data. ArXiv, abs/1805.09501.
[54]Cubuk, E.D., Zoph, B., Shlens, J., & Le, Q.V. (2019). RandAugment: Practical automated data augmentation with a reduced search space. arXiv: Computer Vision and Pattern Recognition.
[55]Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., & Lakshminarayanan, B. (2020). AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. ArXiv, abs/1912.02781.
[56]Zajac, M., Zolna, K., & Jastrzebski, S. (2019). Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift. ArXiv, abs/1904.03515.
[57]Ghiasi, G., Lin, T., & Le, Q.V. (2018). DropBlock: A regularization method for convolutional networks. NeurIPS.
[58]Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., & Hu, Q. (2019). ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. ArXiv, abs/1910.03151.
[59]“Engineering statistics handbook,” https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc431.htm, accessed: 2019-03-22.
[60]Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., & He, K. (2017). Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. ArXiv, abs/1706.02677.
[61]Micikevicius, P., Narang, S., Alben, J., Diamos, G.F., Elsen, E., García, D., Ginsburg, B., Houston, M., Kuchaiev, O., Venkatesh, G., & Wu, H. (2018). Mixed Precision Training. ArXiv, abs/1710.03740.
[62]Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. page 14, 2012
[63]K. Deb. Multi-objective optimization. Search methodologies, pages 403–449, 2014.
[64]Gao, X., Zhao, Y., Dudziak, L., Mullins, R., & Xu, C. (2019). Dynamic Channel Pruning: Feature Boosting and Suppression. ArXiv, abs/1810.05331.
[65]Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

電子全文 電子全文(網際網路公開日期:20250827)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
無相關期刊
 
無相關點閱論文