跳到主要內容

臺灣博碩士論文加值系統

(44.192.22.242) 您好!臺灣時間:2021/08/03 19:43
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:梁晏綸
研究生(外文):Yen-Lun Liang
論文名稱:在多層感知器訓練時加入權重雜訊和權重衰減的研究
論文名稱(外文):Empirical studies on the online learning algorithms based on combining weight noise injection and weight decay
指導教授:沈培輝
學位類別:碩士
校院名稱:國立中興大學
系所名稱:科技管理研究所
學門:商業及管理學門
學類:其他商業及管理學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:英文
論文頁數:126
中文關鍵詞:類神經網路多層感知器容錯權重衰減權重雜訊
外文關鍵詞:Neural NetworksMultilayer Perceptron (MLP)Fault ToleranceWeight DecayWeight Noise
相關次數:
  • 被引用被引用:0
  • 點閱點閱:194
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在類神經網路中, 為了改善容錯效果而在訓練時加入權重雜訊(weight noise) 已經被廣泛的採用, 但是在理論上及實證上皆未獲得證實。在本論文中, 我們將從兩個方面來討論-在weight noise) 和權重衰減(weight decay)。我們把multiplicative weight noise 及additive weight noise 兩種情況分開探討。為了證實收斂情況和容錯能力的表現, 我們透過大量的電腦模擬來得到所需的結果。實驗結果顯示:(一) 在訓練時加入權重雜訊(weight noise) 將不會使權重收斂。(二) 同時加入權重雜訊(weight noise) 和權重衰減(weight decay) 的收斂情況較只加入權重雜訊(weight noise) 來得更好。(三)同時加入權重雜訊(weight noise) 和權重衰減(weight decay) 的容錯能力較只加入權重雜訊(weight noise) 來得更好。

本論文有以下兩個貢獻: 第一, 這些研究結果的一部分, 補充了在最近由Ho, Leung &; Sum 三人在訓練時加入權重雜訊(weight noise) 的收斂情況的研究結果。第二, 另外一部分結果關於容錯的部分在類神經網路裡是一個新的領域。最後, 本論文也帶出一個在訓練時加入權重衰減(weight decay) 的重要訊息。加入權重衰減(weight decay) 不僅可以提高權重的收斂, 也可以改善類神經網路的容錯效果。

While injecting weight noise during training have been widely adopted in attaining fault tolerant neural newtorks, theoretical and empirical studies on the online algorithms developed based on these strategies have yet to be complete. In this thesis, we will investigate two important aspects in regard to the online learning algorithms based on combining weight noise injection and weight decay. Multiplicative weight noise and additive weight noise are considered seperately. The convergence behaviors and the performance of those learning algorithms are investigated via intensive computer simulations. It is found that (i) the online learning algorithm based on purely multiplicative weight noise injection does not converge, (ii) the algorithms combining weight noise injection and weight decay exhibit better convergence behaviors than their pure weight noise injection counterparts, and (iii) the neural networks attained by these algorithms combining weight noise injection and weight decay showing better fault tolerance abilities than the neural networks attained by the pure weight noise injection-based algorithms.

The contributions of these results are two folds. First, part of these empirical results complement the recent findings from Ho, Leung & Sum on the convergence behaviors of the weight noise injection-based learning algorithms. Second, another part of the results which is in regard to the fault tolerance ability are new in the area. Finally, one should note that the results presented in this thesis also bring out an important message adding weight decay during training. Weight decay is not just can improve the convergence of an algorithm, but also can improve the weight noise tolerance ability of a neural network that is attained by these online algorithms.


1 INTRODUCTION . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Outline of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 4
2 Literature Review . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . .5
3 LEARNING ALGORITHMS . . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . .7
3.1 Back propagation algorithm 1 (BPA.1) . . . . . . . . . . . . . . . . .. .. 8
3.1.1 Objective function . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.2 Update equations . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 BPA.1 with weight decay . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 Objective function . . . . . . . . . . . . . . . . . . . . . . . . .11
3.2.2 Update equations . . . . . . . . . . . . . . . . . . . . . . . . . .12
3.3 BPA.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.1 Objective function . . . . . . . . . . . . . . . . . . . . . . . . .14
3.3.2 Update equations . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 BPA.2 with weight decay . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4.1 Objective function . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4.2 Update equations . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5 Weight noise injection training algorithm . . . . . . . . . . . . . . . . 18
3.5.1 Weight noise injection for BPA.1 . . . . . . . . . . . . . . . . . 18
3.5.2 Weight noise injection for BPA.1 with weight decay . . . . . . . . .19
3.5.3 Weight noise injection for BPA.2 . . . . . . . . . . . . . . . . . .20
3.5.4 Weight noise injection for BPA.2 with weight decay . . . . . . . . .21
4 EXPERIMENTS . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .22
4.1 Date sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.1 2D mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31
4.3.2 Mackey-Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
4.3.3 NAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.4 Astrophysical data . . . . . . . . . . . . . . . . . . . . . . . . .40
4.3.5 XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.6 Handwritten recognition . . . . . . . . . . . . . . . . . . . . . . 48
4.3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 CONCLUSION . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 55
A FAULT TOLERANCE ABILITY . . . . . . . . . . . . . . . . . . .. . . . . . . . . .58
A.1 2D mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
A.1.1 Multiplicative Weight Noise (MWN) . . . . . . . . . . . . . . . . . 59
A.1.2 Additive Weight Noise (AWN) . . . . . . . . . . . . . . . . . . . . 60
A.2 Mackey-Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
A.2.1 MWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
A.2.2 AWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
A.3 Nonlinear autoregressive time series (NAR) . . . . . . . . . . . . . . . .63
A.3.1 MWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
A.3.2 AWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
A.4 Astrophysical data . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
A.4.1 MWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
A.4.2 AWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
A.5 XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
A.5.1 MWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
A.5.2 AWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
A.6 Semeion handwritten digital recognition . . . . . . . . . . . . . . . . . 69
A.6.1 MWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
A.6.2 AWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
B THE CHANGES OF THE WEIGHT VALUES DURING TRAINING . . . . . . . . . . . . . . . .71
B.1 2D mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
B.1.1 MWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
B.1.2 AWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
B.2 Mackey-Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
B.2.1 MWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
B.2.2 AWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
B.3 NAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
B.3.1 MWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
B.3.2 AWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
B.4 Astrophysical data . . . . . . . . . . . . . . . . . . . . . . . . . . . .96
B.4.1 MWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
B.4.2 AWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100
B.5 XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104
B.5.1 MWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104
B.5.2 AWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .108
B.6 Semeion handwritten digital recognition . . . . . . . . . . . . . . . . .112
B.6.1 MWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112
B.6.2 AWN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .116
C MATLAB PROGRAM CODES . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .120
Bibliography . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . .124


[1] G. An. The effects of adding noise during backpropagation training on a generalization performance. Neural Computation, Vol.8, 643-674, 1996.
[2] G. Basalyga and E. Salinas. When response variability increases neural network robustness to synaptic noise. Neural Computation, Vol.18, 1349-1379, 2006.
[3] J. L. Bernier, J. Ortega, E. Ros I. Rojas, and A. Prieto. Obtaining fault tolerance multilayer perceptrons using an explicit regularization. Neural Processing Letters, Vol.12, 107-113, 2000.
[4] J. L. Bernier, J. Ortega, I. Rojas, and A. Prieto. Improving the tolerance of multilayer perceptrons by minimizing the statistical sensitivity to weight deviations. Neurocomputing, vol.31, pp.87-103, Jan. 2000.
[5] G. Bolt. Fault tolerant in multi-layer perceptrons. PhD Thesis, University of York, UK,, 1992.
[6] Massimo Buscema. Metanet: The theory of independent judges. Substance Use and Misuse, Volume 33, Issue 2, pp. 439-461., Jan. 1998.
[7] S. Cavalieri and O. Mirabella. A novel learning algorithm which improves the partial fault tolerance of multilayer nns. Neural Networks, Vol.12, 91-106, 1999.
[8] S. Chen. Local regularization assisted orthogonal least squares regression. Neurocomputing, pp.559-585, 2006.
[9] D. Deodhare, M. Vidyasagar, and S. Sathiya Keerthi. Synthesis of fault-tolerant feedforward neural networks using minimax optimization. IEEE Transactions on Neural Networks, Vol.9(5), 891-900, 1998.
[10] P.J. Edwards and A.F. Murray. Can deterministic penalty terms model the effects of synaptic weight noise on network fault-tolerance? International Journal of Neural Systems, 6(4):401-16, 1995.
[11] P.J. Edwards and A.F. Murray. Fault tolerant via weight noise in analog vlsi implementations of mlp’s – a case study with epsilon. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, Vol.45, No.9, p.1255-1262, Sep. 1998.
[12] C.T. Chiu et al. Modifying training algorithms for improved fault tolerance. ICNN’94 Vol.I, 333-338, 1994.
[13] N. Kamiura et al. On a weight limit approach for enhancing fault tolerance of feedforward neural networks. IEICE Transactions on Information & Systems, Vol. E83-D, No.11, 2000.
[14] R. Velazco et al. Seu fault tolerance in artificial neural netwok. IEEE Transactions on Nuclear Science, Vol.42(6), pp.1856-1862, 1995.
[15] N.C. Hammadi and I. Hideo. A learning algorithm for fault tolerant feedforward neural networks. IEICE Transactions on Information & Systems, Vol. E80-D, No.1, 1997.
[16] B. Hassibi and D.G. Stork. Second order derivatiives for network prunning: Optimal brain surgeon. in hanson et al. Advances in Neural Information Processing Systems, 164-171, 1993.
[17] S. Himavathi, D. Anitha, and A. Muthuramalingam. Feedforward neural network implementation in fpga using layer multiplexing for effective resource utilization. IEEE Transactions on Neural Networks, Vol. 18, pp.880-888, 2007.
[18] K. Ho, C.S. Leung, and J. Sum. On weight-noise-injection training, m.koeppen, n.kasabov and g.coghill (eds.). Advances in Neuro-Information Processing, Springer LNCS 5507, pp. 919–926, 2009.
[19] K.C.Jim, C.L. Giles, and B.G. Horne. An analysis of noise in recurrent neural networks: Convergence. IEEE Transactions on Neural Networks, Vol.7, 1424-1438, 1996.
[20] E.W. M. Lee, C. P. Lim, R. K. K. Yuen, and S. M. Lo. A hybrid neural network model for noisy data regression. IEEE Trans. Syst. Man Cybern. B., Cybern., vol. 34, no. 2, pp. 951-960, April 2004.
[21] C.S. Leung and J. Sum. A fault tolerant regularizer for rbf networks. IEEE Transactions on Neural Networks, Vol. 19 (3), pp.493-507, 2008.
[22] C.S. Leung, K.W. Wong, J. Sum, and L.W. Chan. On-line training and prunning for rls algorithm. Electronics Letters, Vol.32, No.23, pp.2152-2153, 1996.
[23] C.S. Leung, K.W. Wong, P.F. Sum, and L.W. Chan. A prunning method for recursive least square algorithm. Neural Networks, 2001.
[24] C.S. Leung, G.H. Young, J. Sum, and W.K. Kan. On the regularization of forgetting recusive least square. IEEE Transactions on Neural Networks, Vol. 10, pp.1842-1846, 1999.
[25] J.E. Moody. Note on generalization, regularization, and architecture selection in nonlinear learning system. First IEEE-SP Workshop on Neural Networks for Signal Processing, 1991.
[26] N. Murata, S. Yoshizawa, and S. Amari. Netwoek information criterion-determining the number of hidden units for an artificial neural network model. IEEE Transactions on Neural Networks Vol.5(6), pp.865-872, 1994.
[27] A.F. Murray and P.J. Edwards. Synaptic weight noise during multilayer perceptron training: fault tolerance and training improvements. IEEE Transactions on Neural Networks, Vol.4(4), 722-725, 1993.
[28] A.F. Murray and P.J. Edwards. Enhanced mlp performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks, Vol.5(5), 792-802, 1994.
[29] M.W. Pederson, L.K. Hansen, and J. Larsen. Prunning with generalization based weight saliencies: robd, robs. Advances in Neural Information Processing Systems 8, pp.512-528, 1996.
[30] D.S. Phatak and I. Koren. Complete and partial fault tolerance of feedforward neural nets. IEEE Transactions on Neural Networks, Vol.6, 446-456, 1995.
[31] R. Reed. Prunning algorithm - a survey. IEEE Transactions on Neural Networks Vol.4(5), pp.740-747, 1993.
[32] Antony W. Savich, Medhat Moussa, and Shawki Areibi. The impact of arithmetic representation on implementing mlp-bp on fpgas: A study. IEEE Transactions on Neural Networks, Vol. 18, pp.240-252, 2007.
[33] Neti C. M.H. Schneider and E.D. Young. Maximally fault tolerance neural networks. IEEE Transactions on Neural Networks, Vol.3(1), 14-23, 1992.
[34] C.H. Sequin and R.D. Clay. Fault tolerance in feedforward artificial neural networks. Neural Networks, Vol.4, 111-141, 1991.
[35] S. Singth. Noise impact on time-series forecasting using an intelligent pattern maching technique. Pattern Recognit., vol. 32, pp.1389-1398, 1999.
[36] M. Sugiyama and H. Ogawa. Optimal design of regularization term and regularization parameter by subspace information criterion. Neural Networks, Vol.15, 349-361, 2002.
[37] J. Sum and K. Ho. Sniwd: Simultaneous weight noise injection with weight decay for mlp training. Proc. ICONIP 2009, Bangkok Thailand, 2009.
[38] J. Sum, C.S. Leung, and K. Ho. On objective function, regularizer and prediction error of a learning algorithm for dealing with multiplicative weight noise. IEEE Transactions on Neural Networks Vol.20(1), Jan, 2009, 2009.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top