研究生(外文):JIA-WEI JIEN
論文名稱(外文):Deep learning of mean field annealing and gradient descent methods
指導教授(外文):Jiann-Ming Wu
外文關鍵詞:deep neural networkimage recognitionmean field theorygradient descent
既有的深度學習學習方法已經結合多種,例如:Gradient descent、mini-batch、dropout、momentum等等。
Hinton提出的兩階段式深度學習方法已經被成功應用在影像、語音辨識上,第一階段為限制式波茲曼學習,restricted Boltzman machines , RBMs以及第二階段back propagation學習,第一階段的RBM學習給予了深層神經網路一個較佳的初始參數,第二階段學習則以第一階段得到的初始參數為基礎一步優化神經網路的內建參數。
本論文主要提出一個新的深度學習方法,目的是想要不藉由限制式波茲曼方法,將Hinton所提出的兩階段式深度學習方法改為一階段式學習方法,並有效達到降低mean-square error以及training error的目標。
Deep neural networks are widely used in machine learning. The deep neural network contains many hidden layers, with the powerful ability of corresponding input to output, through the adjustable neural interconnections. Deep neural network in different artificial intelligence framework, plays a different role, including feature extraction, dimensionality reduction, function approximation, and so on. The existing method of deep learning have been combined with a variety of methods, such as gradient descent, mini-batch, dropout, momentum, etc. Hinton’s two-stage learning method has been successfully applied in image recognition and speech recognition. The first stage is a restricted Boltzmann machine, RBMs[1], and the second stage is backpropagation learning[2] . The first stage of RBM learning gives a better initial parameter for the deep neural network. The second stage of learning is based on the initial parameters obtained in the first stage to optimize the built-in parameters of the neural network. This paper mainly proposes a new method of deep learning. The purpose is to avoid using the restricted Boltzmann method, rewrite Hinton’s two-stage method in one-stage method, and effectively achieve the goal of reducing mean square error and training error. The new method is based on the hybrid mean field annealing and gradient descent learning. In the S-type activation function, a parameter β represents the reciprocal of the annealing temperature. When the degree of entropy is greater, β value is smaller. When the degree of entropy is smaller, β value is greater. Let β grow from small to large gradually. The process is known as the annealing process. The optimal parameters in the process of annealing are recorded. The new method can be used to strength the learning ability of deep neural network, but also can combine the existing method, such as mini-batch, dropout, momentum to improve efficiency of learning. In this paper, handwriting recognition as an example is resolved to illustrate the effectiveness of new method.
I. Introduction……………………………………..1
II. Deep learning…………………………………..5
III. Expectation maximization……………………..11
IV. Hybrid mean field annealing and gradient descent methods……13
V. Numerical simulation………………………….17
VI. Conclusion……………………………………..21
[1]Hinton, G. E., Osindero, S. and Teh, Y.A fast learning algorithm for deep belief nets. Neural Computation 18, pp 1527-1554.(2006)
[2] Rumelhart, D. E., Hinton, G. E., and Williams, R. J. Learning representations by back-propagating errors. Nature, 323, 533--536.(1986)
[3] LeCun, Y., Bengio, Y. and Hinton, G. E. Deep Learning. Nature, Vol. 521, pp 436-444.(2015)
[4] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), pp 1929-1958.(2014)
[5] Moon, Todd K. "The expectation-maximization algorithm." IEEE Signal processing magazine 13.6 (1996): 47-60.
[6] Jiann-Ming Wu, M.H. Chen, Lin Z.H., Independent component analysis based on marginal density estimation using weighted Parzen windows, accepted by Neural Networks, 2008(SCI)
[7] Wu, Jiann-Ming. "Annealing by two sets of interactive dynamics." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34.3 (2004): 1519-1525.
