跳到主要內容

臺灣博碩士論文加值系統

(100.24.118.144) 您好!臺灣時間:2022/12/06 04:25
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張舒婷
研究生(外文):Chang, Su-Ting
論文名稱:深度貝氏神經網路自適應量化之研究
論文名稱(外文):Adaptive Quantization for Deep Bayesian Neural Network
指導教授:簡仁宗簡仁宗引用關係
指導教授(外文):Chien, Jen-Tzung
口試委員:簡仁宗吳卓諭黃思皓蔡尚澕曾煥鑫
口試委員(外文):Chien, Jen-TzungWu, Jwo-Yuh
口試日期:2019-10-17
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電信工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:108
語文別:英文
論文頁數:73
中文關鍵詞:深度學習網路壓縮量化神經網路變分推理貝氏神經網路變分自編碼器
外文關鍵詞:Deep learningquantized neural networkvariational inferenceBayesian neural networkvariational autoencoder
相關次數:
  • 被引用被引用:0
  • 點閱點閱:249
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
舉凡自然語言處理到電腦視覺,深度學習在很多分類任務都達到卓越的結果。正如我們所知,過度擬合依然發生在深度學習中,如何解決此問題是機器學習中新興的分支。一般來說,深層神經網絡(DNNs)的高計算成本和大量記憶體存儲,導致在硬件實現的困難。一個具有挑戰性的問題是開發正規化的解決方案,以平衡執行效率和系統性能。最近的解決方式主要分成,張量矩陣的矩陣分解,與通過施加稀疏約束修剪神經參數。另外,對神經網絡進行量化,以減少內存和加速計算,同時確保分類的準確率。
量化神經網絡採用與一般DNN相同的架構,並把參數設定為一般DNN參數的量化值。在量化神經網路的參數由兩部分組成,一個是量化代表而另一個是分區的間隔。訓練網路的目的是找到自適應的量化代表,並降低量化參數時造成的量化損失與提高分類正確率。以往的研究通常只學習量化代表,但分區的間隔由經驗法則制定。調節超參數的時間造成訓練時間大增。本論文介紹了基於end-to-end訓練方式的自適應量化神經網路。量化器的設計不僅針對一般DNNs,也可以應用於貝葉氏神經網路,其中權重的不確定性增強了模型訓練時的穩定性與魯棒性。
由於所有的權重只由量化代表表示,在非貝氏的自適應量化神經網路有降低記憶體佔量的優點。由精度較高的參數量化成量化代表的量化誤差比只利用少量位元的量化更低。而在貝氏神經網路的量化,欲最小化量化誤差相當於最大化變分下界,設計貝氏神經網路量化器的目的是找到適合的定值來表示隨機變數。”Multi-spike-and-slab”是一個用以表示網路權重事前機率的機率密度,如此假設的原因在於,希望網路權重的事後機率同樣是”Multi-spike-and-slab”,多數權重集中於”spike”,又同時有夠大的機率逃離spike。根據事前機率的假設,最大化變分下界之中的Kullback-Leibler (KL) 散度是沒有解析解的,因此需推導全新的近似方程式。
自適應量化網路在三個層面上有顯著的優勢,第一,解除了對於量化區間的限制,意味著在更大的參數空間找尋最佳解。第二,量化的等分是可調整的,可以量化至任意等分。第三,量化分區與先前研究不同的是,不一定強制對稱於零點,提供非對稱性的量化準則。在實驗分析,我們探索自適應量化神經網路在MNIST與CIFAR10的分類準確率,並且適用於各種不同的神經網路架構,如LeNet與ResNet。
Deep learning has achieved a great success in many classification tasks ranging from natural language processing to computer vision. As we know, deep learning is an emerging and growing branch in machine learning where the overfitting problem still happens and model regularization is required. In general, deep neural networks (DNNs) suffer from high computation cost and memory storage which result in difficulty in hardware implementation for end devices. A challenging issue is to develop a regularized solution to balance the tradeoff between implementation efficiency and system performance. Such an issue was recently tackled through the complexity reduction in DNNs where either tensor weights were decomposed or neural parameters were pruned by imposing sparsity constraint. Alternatively, the quantized neural networks were proposed to reduce memory storage and speed up computation without too much sacrifice in classification accuracy.
Quantized neural networks basically adopt the same architecture as the regular DNNs except that the precision of weights is reduced by quantizing the parameter values. The quantization in DNN consists of two components. One is the representation levels while the other is the partition intervals. The learning objective is to minimize the quantization loss by adaptively choosing these two quantization parameters. Previous studies usually learn the representation levels, but handcraft the partitions. A significant concern in construction of quantized neural networks is the computation time due to a large number of hyperparameter tuning. This dissertation presents a direct solution to adaptively quantize neural network parameters based on an end-to-end learning machine. The quantization schemes are not only developed for non-Bayesian DNNs but also for Bayesian DNNs where the weight uncertainties are considered to enhance the robustness in the estimated model.
Since the weights are only represented by the quantized values, the adaptive quantization in non-Bayesian DNNs provides the merit of memory reduction. Also, the representation values start from full-precision values where the loss of performance due to quantization errors is decreased when compared with previous methods with fewer bits for computation. In Bayesian DNNs, the model parameters are assumed to be random in model construction. Minimizing the quantization loss is comparable to maximizing the variational lower bound of log likelihood where the weight uncertainties are marginalized. The quantization scheme in Bayesian DNN is implemented to deal with how deterministic values are adopted to reflect stochastic values. The implementation is based on the “multi-spike-and-slab” distribution which reflects the prior representation levels or clusters of weight parameters. The resulting posterior also corresponds to a “multi-spike-and-slab” distribution. A new variational inference for quantization of Bayesian DNN is accordingly formulated for efficient computation with the Kullback-Leibler (KL) divergence between true posterior and parameterized posterior to be minimized. There are threefold advantages the proposed quantization methods can provide. First, the restriction in adaptive quantization is relaxed. The optimized values are searched in a wide range of parameter space. Second, the number of representation levels is flexible and adjustable. Such a number was fixed in previous methods. Third, the partition intervals are adaptive and asymmetric instead of fixed and symmetric in traditional methods. In the experiments, the proposed methods are illustrated by a number of analytical results and classification accuracies by using MNIST and CIFAR10 datasets. Furthermore, we use LeNet and ResNet to demonstrate the quantization methods which are feasible to different network structures.
English Abstract i
Contents iii
List of Tables vi
List of Figures vii
List of Algorithms ix
List of Notations x
1 Introduction 1
1.1 Background 1
1.2 Motivation 2
1.3 Outline 3
2 Deep Learning 5
2.1 Deep Neural Networks 5
2.2 Convolutional Neural Networks 9
2.2.1 Introduction 10
2.2.2 Forward Propagation 1
2.3 Bayesian Neural Network 12
2.3.1 Point Estimation 14
2.3.2 Bayesian Inference 15
2.3.3 Approximate inference 16
2.3.4 Variational inference 17
2.3.5 Sampling methods 19
2.4 Straight Through Estimator 20
3 Quantization Neural Network 22
3.1 Network Compression 22
3.1.1 Quantization 22
3.1.2 Pruning method 23
3.1.3 Tensor decomposition 23
3.2 Binary Neural Network 24
3.2.1 Binarized function and forward propagation 24
3.2.2 Gradients propagation 24
3.2.3 Bit-shift acceleration 26
3.3 Ternary Neural Network 26
3.3.1 Symmetric ternary neural network 27
3.3.2 Asymmetric ternary neural network 28
3.4 Vector Quantization 31
3.4.1 Hessian-weighted network quantization 32
3.4.2 Entropy-constrained network quantization 33
4 Adaptive Quantization Neural Network 34
4.1 Introduction 34
4.2 Separate Adaptive Quantization Neural Network 35
4.2.1 Optimization for quantization partition 35
4.2.2 Optimization for representation levels 38
4.2.3 Algorithm for separate adaptive quantization method 41
4.3 Joint Adaptive Quantization Neural Network 43
4.3.1 Variational inference in quantization 44
4.3.2 Algorithm 50
4.4 Memory Storage 52
5 Experiments 54
5.1 MNIST 54
5.2 CIFAR10 55
5.2.1 LeNet 56
5.2.2 ResNet 57
6 Conclusions and Future Works 63
6.1 Conclusions 63
6.2 Future Works 64
[1] D. P. Kingma and M.Welling, \Auto-encoding variational bayes," arXiv preprint
arXiv:1312.6114, 2013.
[2] D. R. S. Saputro and P. Widyaningsih, \Limited memory broyden-
etcher-
goldfarb-shanno (l-bfgs) method for the parameter estimation on geographically
weighted ordinal logistic regression model (gwolr)," in AIP Conference Proceed-
ings, vol. 1868, p. 040009, AIP Publishing, 2017.
[3] O. Chapelle and D. Erhan, \Improved preconditioner for hessian free optimiza-
tion," in NIPS Workshop on Deep Learning and Unsupervised Feature Learning,
vol. 201, 2011.
[4] D. Kingma and J. Ba, \Adam: A method for stochastic optimization," arXiv
preprint arXiv:1412.6980, 2014.
[5] N. Qian, \On the momentum term in gradient descent learning algorithms,"
Neural Networks, vol. 12, no. 1, pp. 145{151, 1999.
[6] Y. LeCun, L. Bottou, Y. Bengio, and P. Haner, \Gradient-based learning
applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11,
pp. 2278{2324, 1998.
[7] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdi-
nov, \Improving neural networks by preventing co-adaptation of feature detec-
tors," arXiv preprint arXiv:1207.0580, 2012.
[8] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov,
\Dropout: a simple way to prevent neural networks from overtting," The jour-
nal of machine learning research, vol. 15, no. 1, pp. 1929{1958, 2014.
[9] C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, \Weight uncer-
tainty in neural network," in International Conference on Machine Learning,
pp. 1613{1622, 2015.
[10] D. P. Kingma, T. Salimans, and M. Welling, \Variational dropout and the local
reparameterization trick," in Advances in Neural Information Processing Sys-
tems, pp. 2575{2583, 2015.
[11] G. E. Hinton and D. Van Camp, \Keeping the neural networks simple by mini-
mizing the description length of the weights," in Proceedings of the sixth annual
conference on Computational learning theory, pp. 5{13, ACM, 1993.
[12] D. Barber and C. M. Bishop, \Ensemble learning in bayesian neural networks,"
NATO ASI SERIES F COMPUTER AND SYSTEMS SCIENCES, vol. 168,
pp. 215{238, 1998.
[13] A. Graves, \Practical variational inference for neural networks," in Advances in
neural information processing systems, pp. 2348{2356, 2011.
[14] D. J. Rezende, S. Mohamed, and D. Wierstra, \Stochastic backpropaga-
tion and approximate inference in deep generative models," arXiv preprint
arXiv:1401.4082, 2014.
[15] K. Gregor, I. Danihelka, A. Mnih, C. Blundell, and D. Wierstra, \Deep autore-
gressive networks," arXiv preprint arXiv:1310.8499, 2013.
[16] R. M. Neal, Bayesian learning for neural networks, vol. 118. Springer Science &
Business Media, 2012.
[17] M. Welling and Y. W. Teh, \Bayesian learning via stochastic gradient langevin
dynamics," in Proceedings of the 28th International Conference on Machine
Learning (ICML-11), pp. 681{688, 2011.
[18] Y. Bengio, N. Leonard, and A. Courville, \Estimating or propagating gradi-
ents through stochastic neurons for conditional computation," arXiv preprint
arXiv:1308.3432, 2013.
[19] Y. LeCun, Y. Bengio, and G. Hinton, \Deep learning," nature, vol. 521, no. 7553,
p. 436, 2015.
[20] M. Denil, B. Shakibi, L. Dinh, N. De Freitas, et al., \Predicting parameters in
deep learning," in Advances in neural information processing systems, pp. 2148{
2156, 2013.
[21] S. Han, J. Pool, J. Tran, and W. Dally, \Learning both weights and connec-
tions for ecient neural network," in Advances in neural information processing
systems, pp. 1135{1143, 2015.
[22] Y. He, X. Zhang, and J. Sun, \Channel pruning for accelerating very deep neural
networks," in Proceedings of the IEEE International Conference on Computer
Vision, pp. 1389{1397, 2017.
[23] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, \Bina-
rized neural networks," in Advances in neural information processing systems,
pp. 4107{4115, 2016.
[24] S. Ioe and C. Szegedy, \Batch normalization: Accelerating deep network train-
ing by reducing internal covariate shift," arXiv preprint arXiv:1502.03167, 2015.
[25] F. Li, B. Zhang, and B. Liu, \Ternary weight networks," 2016.
[26] C. Zhu, S. Han, H. Mao, and W. J. Dally, \Trained ternary quantization," arXiv
preprint arXiv:1612.01064, 2016.
[27] Y. Choi, M. El-Khamy, and J. Lee, \Towards the limit of network quantization,"
arXiv preprint arXiv:1612.01543, 2016.
[28] R. L. Burden and J. D. Faires, \Numerical analysis, brooks," Cole Pub, vol. 7,
1997.
[29] M. West, \On scale mixtures of normal distributions," Biometrika, vol. 74, no. 3,
pp. 646{648, 1987.
[30] J. Achterhold, J. M. Koehler, A. Schmeink, and T. Genewein, \Variational net-
work quantization," 2018.
[31] Y. LeCun et al., \Lenet-5, convolutional neural networks," URL: http://yann.
lecun. com/exdb/lenet, vol. 20, p. 5, 2015.
[32] K. He, X. Zhang, S. Ren, and J. Sun, \Deep residual learning for image recog-
nition," in The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), June 2016.
[33] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, and Y. Bengio, \Generative adversarial nets," in Advances in neural
information processing systems, pp. 2672{2680, 2014.
電子全文 電子全文(網際網路公開日期:20241126)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top