跳到主要內容

臺灣博碩士論文加值系統

(44.201.94.236) 您好!臺灣時間:2023/03/25 00:55
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃慶昀
研究生(外文):HUANG, CHING-YUN
論文名稱:卷積神經網路於細粒度影像分類之模型剪枝評估
論文名稱(外文):Evaluations of Model Pruning of Convolutional NeuralNetwork for Fine-Grained Image Classification
指導教授:汪群超
指導教授(外文):WANG, CHUN-CHAO
口試委員:汪群超吳漢銘須上英
口試委員(外文):WANG, CHUN-CHAOWU, HAN-MINGSHIU, SHANG-YING
口試日期:2020-06-22
學位類別:碩士
校院名稱:國立臺北大學
系所名稱:統計學系
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:55
中文關鍵詞:卷積神經網路模型剪枝細粒度影像分類雙線性池化
外文關鍵詞:Convolutional Neural NetworkModel PruningFine-Grained Image ClassificationBilinear Pooling
相關次數:
  • 被引用被引用:0
  • 點閱點閱:262
  • 評分評分:
  • 下載下載:51
  • 收藏至我的研究室書目清單書目收藏:0
近年來電腦硬體性能進步飛快,誘使深度學習研究為了取得
更好的預測表現,不斷嘗試增加模型層數與參數。儘管模型性能
獲得大幅提升,然而複雜的模型也伴隨著更高的儲存及計算成
本。當模型要部署在行動裝置或嵌入式系統等硬體資源有限的設
備時,對記憶體使用量與計算時間都有嚴格要求,因此如何簡化
模型成為深度學習技術落地的關鍵。

本研究探討結構性剪枝方法壓縮雙線性卷積神經網路的效
果,實驗分別採用一次性、迭代性與軟性剪枝三種不同剪枝流
程,並測試加重單層懲罰係數後的模型剪枝效果。實驗結果顯
示,若採用一次性剪枝搭配加重懲罰設定,航空器與汽車辨識模
型能減少高達 9 成模型參數,而難度較高的鳥類辨識模型也能減
少 8 成模型參數。另外實驗也發現適當程度的剪枝,在部分情況
下反而能提進一步提升模型準確率
In this paper, we demonstrate that strcutured pruning is effective through various experiments on bilinear CNN model for fine-grained image classification. In addition to regular pruning process, we compare different proposed pruning schemes, such as iterative pruning and soft filter pruning combining with Network Slimming. Further study reveals that larger L1 regularization on the last batch normalization layer achieves better performance to obtain a smaller model.
緒論 1
1.1 研究背景與動機 . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 研究範圍及目的 . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 研究實驗流程 . . . . . . . . . . . . . . . . . . . . . . . . 3
2 文獻回顧 7
2.1 深度學習的參數冗餘性 . . . . . . . . . . . . . . . . . . . . 7
2.2 剪枝方法的發展 . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 非結構性剪枝 . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 結構性剪枝 . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 剪枝準則 . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 剪枝效果的反思 . . . . . . . . . . . . . . . . . . . . . . 13
3 研究方法 17
3.1 影像處理 . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 資料增強 . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 影像縮放 . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.3 其他處理 . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 模型架構 . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 雙線性卷積神經網路 . . . . . . . . . . . . . . . . . . . 19
3.2.2 卷積神經網路 . . . . . . . . . . . . . . . . . . . . . . 21
3.3 模型壓縮 . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 模型訓練 . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.2 模型剪枝 . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.3 模型微調 . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 模型成本評估 . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.1 模型參數量 . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.2 模型計算量(FLOPs) . . . . . . . . . . . . . . . . . . . 29
4 實驗 33
4.1 模型訓練 . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.1 資料集說明 . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.2 超參數設定 . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.3 建模結果 . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 模型剪枝 . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 一次性剪枝 . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.2 迭代性剪枝 . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.3 軟性剪枝 . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.4 加重懲罰 . . . . . . . . . . . . . . . . . . . . . . . . 45
5 結論 47
參考文獻 49


圖目錄
圖 1.1 影像處理流程:其中 One-Hot Encoding 為一種編碼方法,
用於轉換資料集的類別標籤為布林向量( Logical Vector)。
訓練時將與模型輸出的預測機率一同代入損失函數中。 . . . 4
圖 1.2 模型訓練流程 . . . . . . . . . . . . . . . . . . . . . . . . . . 5
圖 1.3 模型剪枝流程:技術上,直接移除模型的冗餘參數較為
困難。因此實作中通常是建立一個新的模型骨架( Model
Backbone),將保留參數複製到新模型中,並捨棄原始模型。
最終一樣能達到實質壓縮效果。 . . . . . . . . . . . . . . . . . 6
圖 2.1 非結構性剪枝:為了減少列壓縮向量的索引長度,該研
究採用相對索引( Relative Index)。相對索引記錄當稀疏矩
陣展開為向量後,每個非零數值與前一個非零數值的距離。
例如稀疏矩陣中 5.9 距離前一個非零數值 4.9 距離為 2,因此
相對索引為 2。 . . . . . . . . . . . . . . . . . . . . . . . . . . 8
圖 2.2 結構性剪枝:構圖參考 Nervana Systems (2018) 文章 Pruning Filters & Channels,已取得重製同意。 . . . . . . . . . . . 10
圖 2.3 細粒度影像分類:橫軸 Inter-Class Similarity 表示不同類
別間相似性極高,縱軸 Intra-Class Variation 表示同一類別內
存在高度變異性。圖片引用自 Hsu et al. (2019)。 . . . . . . . 14
圖 3.1 資料增強後的汽車影像 . . . . . . . . . . . . . . . . . . . . 18
iii圖 3.2 雙線性卷積神經網路( BCNN) . . . . . . . . . . . . . . . 19
圖 3.3 卷積運算及最大池化 . . . . . . . . . . . . . . . . . . . . . . 21
圖 3.4 激勵函數:左側為 Sigmoid 函數,右側為 RuLU 函數。激
勵函數通常被安插在模型每個卷積層與池化層之間。 . . . . . 21
圖 3.5 VGG 網路結構:其中每個標記數字的長方形代表一個卷
積層,數字代表該層卷積核個數。 . . . . . . . . . . . . . . . 22
圖 3.6 加入批標準化層的結構性剪枝:卷積層每張輸出通道標
準化後再執行伸縮和位移。 . . . . . . . . . . . . . . . . . . . 25
圖 4.1 影像資料部分樣本: 3 份資料集皆為平衡資料,各類別樣
本數差距僅 1 到 2 張。大多數影像都有 RGB 3 個通道,少
數黑白影像已事先轉為 RGB 格式。每張影像尺寸不盡相同。
較大的影像尺寸可達 2000x1000,較小的影像尺寸則可能只
有 100x100。但絕大部分影像尺寸都有超過 224x224。 . . . . 34
圖 4.2 模型 γ 參數分布:由上到下分別是航空器、鳥類與汽車的
模型 γ 參數分布,左側為直方圖,右側則是累積分布函數。
其中直方圖內黑色柱子為模型正則化後趨近於 0 的參數。圖
( f)中顯示汽車的模型有近 4 成 γ 參數已被稀疏化,可預期
當剪枝率低於 40% 時,汽車的模型性能幾乎不會退步。 . . . 39
圖 4.3 3 種不同剪枝流程:圖中由上至下分別為一次性、迭代性
以及軟性剪枝流程。許多文獻將剪枝後繼續訓練模型的過程
稱作微調( Fine-Tuned)。為了避免和調整超參數混淆,本研
究則以 Retrain 表示。 . . . . . . . . . . . . . . . . . . . . . . . 40


表目錄
表 4.1 影像資料集基本資訊 . . . . . . . . . . . . . . . . . . . . . . 34
表 4.2 模型超參數設定 . . . . . . . . . . . . . . . . . . . . . . . . 35
表 4.3 模型消融實驗 . . . . . . . . . . . . . . . . . . . . . . . . . . 38
表 4.4 一次性剪枝實驗結果 . . . . . . . . . . . . . . . . . . . . . . 41
表 4.5 迭代性剪枝實驗結果 . . . . . . . . . . . . . . . . . . . . . . 42
表 4.6 軟性剪枝實驗結果 . . . . . . . . . . . . . . . . . . . . . . . 45
表 4.7 一次性剪枝搭配加重懲罰實驗結果 . . . . . . . . . . . . . . 46
Ba, J. and Caruana, R. (2014). Do deep nets really need to be deep? In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., and Weinberger, K. Q.,
editors, Advances in Neural Information Processing Systems 27, pages 2654–
2662. Curran Associates, Inc.
Courbariaux, M., Bengio, Y., and David, J.-P. (2015). Binaryconnect: Training
deep neural networks with binary weights during propagations. In Cortes, C.,
Lawrence, N. D., Lee, D. D., Sugiyama, M., and Garnett, R., editors, Advances
in Neural Information Processing Systems 28, pages 3123–3131. Curran Associates, Inc.
Du, S. S., Lee, J. D., Li, H., Wang, L., and Zhai, X. (2019). Gradient descent finds
global minima of deep neural networks. International Conference on Machine
Learning, abs/1811.03804.
Falkner, S., Klein, A., and Hutter, F. (2018). BOHB: Robust and efficient hyperparameter optimization at scale. In Dy, J. and Krause, A., editors, Proceedings
of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1437–1446, Stockholmsmässan, Stockholm Sweden. PMLR.
Frankle, J. and Carbin, M. (2019). The lottery ticket hypothesis: Training pruned neural networks. International Conference on Learning Representations, abs/1803.03635
Gale, T., Elsen, E., and Hooker, S. (2019). The state of sparsity in deep neural
networks. ArXiv:cs.LG/1902.09574.
Gao, Y., Beijbom, O., Zhang, N., and Darrell, T. (2015). Compact bilinear pooling. Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, abs/1511.06062.
Haeffele, B. D. and Vidal, R. (2017). Global optimality in neural network training.
In IEEE Conference on Computer Vision and Pattern Recognition, pages 4390–
4398.
Han, S., Mao, H., and Dally, W. J. (2016). Deep compression: Compressing deep
neural network with pruning, trained quantization and huffman coding. In Bengio, Y. and LeCun, Y., editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference
Track Proceedings.
Han, S., Pool, J., Tran, J., and Dally, W. J. (2015). Learning both weights and
connections for efficient neural networks. Advances in Neural Information Processing Systems, abs/1506.02626.
Hassibi, B., Stork, D. G., and Wolff, G. J. (1993). Optimal brain surgeon and general network pruning. In IEEE International Conference on Neural Networks,
pages 293–299 vol.1.
He, Y. and Han, S. (2018). ADC: automated deep compression and acceleration
with reinforcement learning. European Conference on Computer Vision, abs/
1802.03494
He, Y., Kang, G., Dong, X., Fu, Y., and Yang, Y. (2018). Soft filter pruning for accelerating deep convolutional neural networks. International Joint Conferences
on Artificial Intelligence, abs/1808.06866.
He, Y., Zhang, X., and Sun, J. (2017). Channel pruning for accelerating very
deep neural networks. International Conference on Computer Vision, abs/
1707.06168.
Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural
network. In NIPS Deep Learning and Representation Learning Workshop.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural
networks for mobile vision applications. ArXiv:1704.04861.
Hsu, Y.-C., Hong, C.-Y., Chen, D.-J., Lee, M.-S., Geiger, D., and Liu, T.-L. (2019).
Fine-grained visual recognition with batch confusion norm. ArXiv:1910.12423.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. International Conference on
Machine Learning, abs/1502.03167.
Jaderberg, M., Vedaldi, A., and Zisserman, A. (2014). Speeding up convolutional
neural networks with low rank expansions. BMVC 2014 - Proceedings of the
British Machine Vision Conference 2014.
Kar, P. and Karnick, H. (2012). Random feature maps for dot product kernels.
In Lawrence, N. D. and Girolami, M., editors, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 of
Proceedings of Machine Learning Research, pages 583–591, La Palma, Canary
Islands. PMLR
Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., and Tang, P. T. P.
(2017). On large-batch training for deep learning: Generalization gap and
sharp minima. International Conference on Learning Representations, abs/
1609.04836.
Kong, S. and Fowlkes, C. C. (2016). Low-rank bilinear pooling for fine-grained
classification. Conference on Computer Vision and Pattern Recognition, abs/
1611.05109.
Krause, J., Stark, M., Deng, J., and Fei-Fei, L. (2013). 3d object representations
for fine-grained categorization. In 4th International IEEE Workshop on 3D
Representation and Recognition (3dRR-13), Sydney, Australia.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification
with deep convolutional neural networks. In Pereira, F., Burges, C. J. C., Bottou,
L., and Weinberger, K. Q., editors, Advances in Neural Information Processing
Systems 25, pages 1097–1105. Curran Associates, Inc.
LeCun, Y., Denker, J. S., and Solla, S. A. (1990). Optimal brain damage. In
Touretzky, D. S., editor, Advances in Neural Information Processing Systems 2,
pages 598–605. Morgan-Kaufmann.
Li, H., Kadav, A., Durdanovic, I., Samet, H., and Graf, H. P. (2016). Pruning filters
for efficient convnets. International Conference on Learning Representations,
abs/1608.08710.
Lin, T., RoyChowdhury, A., and Maji, S. (2015). Bilinear cnn models for finegrained visual recognition. In IEEE International Conference on Computer
Vision, pages 1449–1457
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., and Zhang, C. (2017). Learning efficient convolutional networks through network slimming. IEEE International
Conference on Computer Vision, abs/1708.06519.
Liu, Z., Sun, M., Zhou, T., Huang, G., and Darrell, T. (2018). Rethinking the value
of network pruning. International Conference on Learning Representations,
abs/1810.05270.
Luo, J., Wu, J., and Lin, W. (2017). Thinet: A filter level pruning method for deep
neural network compression. International Conference on Computer Vision,
abs/1707.06342.
Maji, S., Kannala, J., Rahtu, E., Blaschko, M., and Vedaldi, A. (2013). Finegrained visual classification of aircraft. Technical Report.
Molchanov, P., Tyree, S., Karras, T., Aila, T., and Kautz, J. (2016). Pruning convolutional neural networks for resource efficient transfer learning. International
Conference on Learning Representations, abs/1611.06440.
NervanaSystems (2019). Distiller: Pruning filters and channels. Accessed: 2020-
05-15.
Oguntola, I., Olubeko, S., and Sweeney, C. (2018). Slimnets: An exploration of
deep model compression and acceleration. IEEE High Performance extreme
Computing Conference, abs/1808.00496.
Pham, N. and Pagh, R. (2013). Fast and scalable polynomial kernels via explicit
feature maps. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’ 13, page 239– 247,
New York, NY, USA. Association for Computing Machinery
Simonyan, K. and Zisserman, A. (2015). Very deep convolutional networks for
large-scale image recognition. In Bengio, Y. and LeCun, Y., editors, 3rd International Conference on Learning Representations, San Diego, CA, USA, May
7-9, 2015, Conference Track Proceedings.
Wah, C., Branson, S., Welinder, P., Perona, P., and Belongie, S. (2011). The
caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, (CNS-TR-2011-001).
Wei, X., Zhang, Y., Gong, Y., Zhang, J., and Zheng, N. (2018). Grassmann pooling
as compact homogeneous bilinear pooling for fine-grained visual classification.
In The European Conference on Computer Vision.
Wen, W., Wu, C., Wang, Y., Chen, Y., and Li, H. (2016). Learning structured
sparsity in deep neural networks. Advances in Neural Information Processing
Systems, abs/1608.03665.
Yeom, S.-K., Seegerer, P., Lapuschkin, S., Wiedemann, S., Müller, K.-R., and
Samek, W. (2019). Pruning by explaining: A novel criterion for deep neural
network pruning. ArXiv:1912.08881.
Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014). How transferable are
features in deep neural networks? Neural Information Processing Systems, abs/
1411.1792.
You, K., Long, M., Wang, J., and Jordan, M. I. (2019). How does learning rate
decay help modern neural networks? ArXiv:1908.01878.
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with
grouped variables. Journal of the Royal Statistical Society Series B, 68:49–67.
Zhu, M. and Gupta, S. (2018). To prune, or not to prune: exploring the efficacy
of pruning for model compression. International Conference on Learning Representations Workshop Track, abs/1710.01878.
Zoph, B. and Le, Q. V. (2016). Neural architecture search with reinforcement learning. International Conference on Learning Representations, abs/
1611.01578.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top