跳到主要內容

臺灣博碩士論文加值系統

(44.200.171.156) 您好!臺灣時間:2023/03/22 01:11
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳志勝
研究生(外文):Chen, Chih-Sheng
論文名稱:提供高效平行推論之具標的感知與免數據集模型壓縮系統
論文名稱(外文):Data-Free Target-Aware Model Compression System for Efficient Parallel Inference
指導教授:陳添福陳添福引用關係
指導教授(外文):Chen, Tien-Fu
口試委員:陳中和賴伯承張貴忠陳添福
口試委員(外文):Chen, Chung-HoLai, Po-ChengChang, Kuei-ChungChen, Tien-Fu
口試日期:2020-07-21
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學與工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:50
中文關鍵詞:免數據集模型壓縮平行推論
外文關鍵詞:data-freemodel compressionparallel inference
相關次數:
  • 被引用被引用:0
  • 點閱點閱:185
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
因應AI民主化的趨勢,大量的產業開始應用深度學習模型來解決他們各自的問題。然而這些公司往往忽略了最佳化的重要性,這無形中增加了許多成本上的支出。另一方面,對於資料隱私性的顧慮使得公司們往往不願意提供他們的資料給外部的模型最佳化服務,因此,我們提出一個免資料集的模型優化系統,僅僅只要提供預訓練好的神經網路即能加速推論的速度。我們的方法不僅優於其它SOTA的作法,而且需要的計算資源也相對較少。另外,針對雲等級的推論服務最佳化,我們考量了傳統方法無法因應這類高度動態的資源調度變化,所以引入了新的指標-可擴充性,來協助我們對於雲端上的模型做進一步的優化。而且,為了要能夠更加細緻的反應硬體執行上的變化,我們去剖析推論過程中每個硬體資源的使用情況,並且以此做到更精準的模型最佳化系統。
In response to the democratization of AI, a large number of industries have begun to apply deep learning networks to solve their problems. However, they usually ignore the importance about the optimization for these applications; this makes the cost for AI solution rise significantly. Another way, data privacy concern is also a big problem for the companies. Many critical training data cannot be provided to the external optimization service. Therefore, we propose a new data-free target-aware model compression system to optimize the given pre-trained model without any original training dataset. In the experiments, our proposed methods not only exceed the SOTA data-free compression methods, but also use less computing resources. In addition, in order to increase the runtime variance tolerance about the models in the cloud, we introduce a novel metric, scalability, to guide the model optimization for the cloud. To reflect the more detail difference between the compressed models, we profile the hardware counters information for the inference to do precise optimization.
摘要 i
ABSTRACT ii
誌謝 iii
Table of Contents iv
List of Tables vi
List of Figures vii
I. Introduction 1
1.1 Motivation 1
1.2 Contribution 2
II. Background & Related Work 3
2.1 Optimization methods for efficient parallel inference 3
2.1.1 Static optimization 3
2.1.2 Runtime scheduling 5
2.2 Data-free model learning 7
2.2.1 Synthetic dataset 8
2.2.2 Knowledge transfer 10
2.3 AutoML 11
2.3.1 Multi-objective function 12
2.3.2 Search algorithm 14
2.4 Summary 15
III. Data-Free Target-Aware Compress 17
3.1 Target-aware automatic model compression 18
3.1.1 Reflect the parallel efficiency by scalability 19
3.1.2 Formulate a specific multiple-objective function 21
3.2 Data-free network fine-tuning 24
3.2.1 Image generator to imitate original training set 25
3.2.2 Data-free fine-tune by knowledge distillation 27
3.3 Improve the fine-tune efficiency after compression 28
3.3.1 Freeze the batch-norm layers at beginning 29
3.3.2 Smooth learning rate warmup 30
IV. Improve by Fine-grained Metrics 31
4.1 Encode the time-series metrics information 32
4.2 Automatically select sensitive hardware indicators 32
4.3 Automatically search by these fine-grained metrics 34
V. Experiment 35
5.1 Environment 35
5.1.1 Deployment 35
5.1.2 Measurement stability 36
5.2 Results 37
5.2.1 Improve generator by BN statistics 37
5.2.2 Effect of the BN statistics and Inner distillation 39
5.2.3 Comparison of data-free and normal fine-tune 40
5.2.4 Freeze BN layers & smooth LR warmup 41
5.2.5 Model compression by area 43
5.2.6 Model compression by slope 45
5.2.7 Model compression by hardware metrics 46
VI. Conclusion 47
Reference 48
[1]Yihui He, Xiangyu Zhang, and Jian Sun, "Channel Pruning for Accelerating Very Deep Neural Networks," in Proceedings of IEEE International Conferennce on Computer Vision (ICCV), 2017.
[2]Song Han, Huizi Mao, and William J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," in Proceedings of the International Conference on Learning Representations (ICLR), 2016
[3]Hao Li, Asim Kadav, Igor Durdanovic, et al., "Pruning Filters for Efficient ConvNets," in Proceedings of the International Conference on Learning Representations (ICLR), 2017
[4]Liangzhen Lai, Naveen Suda, and Vikas Chandra, "Not All Ops Are Created Equal!," in Proceedings of the Conference on Systems and Machine Learning (SysML), 2018.
[5]Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, et al., "ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design," in Proceedings of the Conference on European Conference on Computer Vision (ECCV), 2018
[6]Tien-Ju Yang, Andrew Howard, Bo Chen, et al., "NetAdapt: Platform-Aware Neural NetworkAdaptation for Mobile Applications," in Proceedings of the Conference on European Conference on Computer Vision (ECCV), 2018
[7]Andrew G. Howard, Menglong Zhu, Bo Chen, et al., "Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint arXiv:1704.04861 (2017)
[8]Mark Sandler, Andrew Howard, Menglong Zhu, et al., "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[9]Zhou Fang, Tong Yu, Ole J. Mengshoel, et al., "QoS-Aware Scheduling of Heterogeneous Servers for Inference in Deep Neural Networks," in Proceedings of the ACM on Conference on Information and Knowledge Management (CIKM), 2017
[10]NVIDIA Triton Inference Server, URL: https://developer.nvidia.com/nvidia-triton-inference-server
[11]H. Yin, P. Molchanov, J. Alvarez, et al., "Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion," in Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[12]Hanting Chen, Yunhe Wang, Chang Xu, et al., "Data-Free Learning of Student Networks," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019
[13]Jianping Gou, Baosheng Yu, Stephen John Maybank, et al., “Knowledge Distillation: A Survey,” arXiv preprint arXiv:2006.05525 (2020).
[14]Geoffrey Hinton, Oriol Vinyals, and Jeff Dean, “Distilling the Knowledge in a Neural Network,” in Proceedings of the Workshop on Neural Information Processing Systems (NIPS), 2014..
[15]Yoshua Bengio, Aaron Courville, and Pascal Vincent. (2013). “Representation learning: A review and new perspectives,” IEEE TPAMI35(8):1798–1828..
[16]Junho Yim, Donggyu Joo, Jihoon Bae et al., “A gift from knowledge distillation: Fast optimization, network minimization and transfer learning,” in Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2017..
[17]Xiaoliang Dai, Peizhao Zhang, Bichen Wu, et al., "ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[18]Yihui He, Ji Lin, Zhijian Liu, et al., "AMC: AutoML for Model Compression and Acceleration on Mobile Devices," in Proceedings of the European Conference on Computer Vision (ECCV), 2018
[19]Bichen Wu, Xiaoliang Dai, Peizhao Zhang, et al., "FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[20]Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan, et al., "DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures," in Proceedings of the Conference on European Conference on Computer Vision (ECCV), 2018
[21]Yanqi Zhou, Siavash Ebrahimi, Sercan Ö. Arık et al., "Resource-Efficient Neural Architect," arXiv preprint arXiv:1806.07912 (2018).
[22]T. Elsken, J. H. Metzen, and F. Hutter, "Multi-objective architecture search for cnns," arXiv preprint arXiv:1804.09081 (2018).
[23]Mingxing Tan, Bo Chen, Ruoming Pang, et al., "MnasNet: Platform-Aware Neural Architecture Search for Mobile," in Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[24]N. Gunantara, "A review of multi-objective optimization: Methods and its applications," Cogent Eng., vol. 5, no. 1, 2018, Art. no. 1502242.
[25]Kalyanmoy Deb, "Multi-objective optimization. Search methodologies,", 2014.
[26]Donald R Jones, Matthias Schonlau, and William J Welch, "Efficient global optimization of expensive black-box functions," Journal of Global optimization, 13(4):455–492, 1998.
[27]J. Bergstra, D. Yamins, and D. D. Cox, "Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures," in Proceedings of the 30th International Conference on Machine Learning (ICML), 2013
[28]David E. Goldberg and John H. Holland, "Genetic Algorithms and Machine Learning," Machine Learning 3, 95–99 (1988). https://doi.org/10.1023/A:1022602019183
[29]Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.
[30]Suraj Srinivas, R. Venkatesh Babu, "Data-free parameter pruning for Deep Neural Networks," in Proceedings of the British Machine Vision Conference (BMWC), 2015
[31]Xin Li, Shuai Zhang, Bolan Jiang, et al., "DAC: Data-free Automatic Acceleration of Convolutional Networks," in Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), 2019
[32]Markus Nagel, Mart van Baalen, Tijmen Blankevoort, et al., "Data-free quantization through weight equalization and bias correction," in Proceedings of the International Conference on Computer Vision (ICCV), 2019
[33]Raphael Gontijo Lopes, Stefano Fenu, and Thad Starner, "Data-Free Knowledge Distillation for Deep Neural Networks," in Proceedings of the Conference on Neural Information Processing Systems (NIPS), 2017
[34]Matan Haroush, Itay Hubara, Elad Hoffer, et al., "The Knowledge Within: Methods for Data-Free Model Compression," arXiv preprint arXiv:1912.01274, 2019.
[35]Yonathan Aflalo, Asaf Noy, Ming Lin, et al., "Knapsack Pruning with Inner Distillation," arXiv preprint arXiv:2002.08258, 2020.
[36]NVIDIA Nsight Compute Tool, URL: https://developer.nvidia.com/nsight-compute.
[37]Alex Krizhevsky, "Learning Multiple Layers of Features from Tiny Images," Technical Report TR-2009, University of Toronto, Toronto (2009).
[38]Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al., "Deep Residual Learning for Image Recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[39]NVIDIA TenosrRT, URL: https://developer.nvidia.com/tensorrt
[40]NVIDIA cuDNN, URL: https://developer.nvidia.com/cudnn
[41]NVIDIA cuBLAS, URL: https://developer.nvidia.com/cublas
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top