跳到主要內容

臺灣博碩士論文加值系統

(44.200.122.214) 您好!臺灣時間:2024/10/06 02:05
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:黃煜閔
研究生(外文):Huang, Yu-Min
論文名稱:隨機捲積遞迴網路
論文名稱(外文):Stochastic Convolutional Recurrent Networks
指導教授:簡仁宗簡仁宗引用關係
指導教授(外文):Chien, Jen-Tzung
口試委員:吳卓諭黃思皓蔡尚澕曾煥鑫
口試委員(外文):Wu, Jwo-YuhHuang, Szu-HaoTsai, Shang-HoTseng, Huan-Hsin
口試日期:2019-10-17
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:108
語文別:英文
論文頁數:70
中文關鍵詞:深度學習序列學習自迴歸生成變分推理捲積神經網路遞迴神經網路
外文關鍵詞:deep learningsequential learningautoregressive generationvariational inferenceconvolutional neural networkrecurrent neural network
相關次數:
  • 被引用被引用:0
  • 點閱點閱:273
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
深度學習在電腦視覺跟自然語言處理領域都取得巨大的成功。基本上,深度神經網路可以處理高維數據間複雜的輸入與輸出目標間映射,並在分類與迴歸問題上有良好的表現。然而,深度神經網路仍無法在高維數據的生成任務中得到想要的結果。在此同時,序列資料是隨處可見的。從語音資料、文字、影片,都是序列資料的一種。當我們在處理訊序列學習與生成時,因為訊號中存在因果性,很重要的是要根據過去的樣本來預測或生成未來的目標,。我們稱這種基於所有過去觀察的預測為自迴歸生成(autoregressive generation)。在這篇論文中,我們針對自迴歸生成提出了一個新的隨機表徵學習方法,而中間的生成跟推斷過程是基於捲積神經網路(CNN)跟遞迴神經網路(RNN)來做延伸的。
遞迴神經網路是利用更新動態狀態當作內部記憶來表達序列的模式並抓出時序性的資訊。多年來,遞迴神經網路被視作在自迴歸模型中一個很常見的時序性學習方法。捲積神經網路已經在電腦視覺中取得很大的成功,近年來時間捲積網路也開始被應用在時序性學習上。受惠於平行運算優勢,時間捲積網路可以快速的生成。而因為每一層具有不同大小的接受區,多層的時間捲積網路抓到時間上的階層關係。遞迴神經網路跟時間捲積網路在時序學習上都是可行的。而這篇論文是想結合兩者的優點,建立一個捲積遞迴網路(CRN)。基本上,時間捲積網路有很強的能力來抓取局部的特徵,遞迴神經網路則可以從序列中抓到一些長時間的關係。我們提出的捲積遞迴神經網路利用捲積層將局部的資訊編碼,接著用遞迴層來解碼、預測、生成。藉由將時間捲積網路當作編碼器,遞迴神經網路作為解碼器,我們提出了一個混合模型來結合捲積神經網路跟遞迴神經網路。局部跟全局的特徵都能被充分的表達到。重點是,遞迴層能夠解除時間捲積網路中接受區大小被層數的限制。捲積遞迴網路可以彌補在時間捲積網路中不足以充分表達到的長時間資訊。
接下來我們針對捲積遞迴網路提出了兩個進一步延伸,一個是隨機捲積遞迴網路(SCRN),另一個是多解析捲積遞迴網路(multi-resolution CRN)。隨機捲積遞迴網路是想藉由引進隨機性來改善捲積遞迴網路的魯棒性。隨機潛在變數的最佳化是變分推斷(variational inference),讓其中的證據下界(evidence lower bound)被最大化,來完成的。在另一方面,我們質疑在捲積遞迴網路中只用最後一層捲積層的資訊給遞迴層,編碼進去的訊息會不夠豐富。而最終預測的結果可能會因此受限。據此,我們提出了多解析捲積遞迴網路來將不同捲積層中學到的多解析度的資訊,分別交給不同的遞迴層來解碼。如此一來,不同時間長度的關係都能在多解析捲積遞迴網路中被學到。
在語言模型的實驗中,我們比較了各種不同捲積遞迴網路的變體的表現,我們提出的模型跟遞迴神經網路和時間捲積網路比的優勢在實驗中充分的展現出來。
Deep learning has achieved great success in computer vision and natural language processing. Basically, deep neural networks (DNNs) are able to handle high-dimensional data with the complicated mapping between input data and output targets and perform well for different classification and regression tasks. Nevertheless, it is still challenging to carry out a desirable generation task in presence of high-dimensional data. Meanwhile, sequence data are everywhere in real world. Sequence data are ranged from speech signals to natural sentences and video streams, to name a few. When we deal with sequential learning and generation, it is important to predict or generate future targets based on all previous samples due to the causal property in signals. Such a prediction is also named as the autoregressive generation where the prediction at each time step is conditioned on all previous observations. This thesis presents a new stochastic learning representation for autoregressive generation where an inference and generative procedure based on convolutional neural network (CNN) and recurrent neural network (RNN) is developed.
RNN is specialized to characterize the sequential patterns and extract temporal information by evolving the dynamic states through time as an internal memory. RNN has been recognized as a popular sequential learning solution to autoregressive model for many years. Recently, temporal convolutional network (TCN) is proposed for sequential learning although CNN has been successfully developed for spatial modeling in computer vision. Typically, TCN is beneficial for parallel computation which provides rapid generation. Multilayer TCN can capture the temporal hierarchy where different layers represent various sizes of receptive fields. RNN and TCN are both feasible to sequential learning. This work aims to combine the advantages in TCN and RNN for construction of the so-called convolutional recurrent network (CRN). Basically, TCN is powerful to extract local features while RNN is able to capture long-term temporal dependencies in sequence data. The proposed CRN would like to infer or encode local information via convolutional layers and then predict, decode or generate each individual time sample via recurrent layers. CRN corresponds to implement TCN as encoder and RNN as decoder. A hybrid model of TCN and RNN is established. The complementary local and global features are characterized. Importantly, the recurrent layers in CRN are used to relax the limitation of TCN where the size of receptive field is constrained by the number of layers. CRN allocates the recurrent layers on top of convolutional layers so as to compensate the insufficiency of long-term temporal characteristics in TCN.
The proposed CRN is further improved by twofold extensions including the stochastic CRN (SCRN) and the multi-resolution CRN. Stochastic CRN aims to improve the robustness of CRN by incorporating stochastic property into CRN. The randomness of latent variables is considered in optimization procedure via variational inference where the variational lower bound of log likelihood, marginalized over latent variables, is maximized. On the other hand, we challenge the richness of the encoded information in CRN where only the latent variable in the last convolutional layer is fed into recurrent layer. The prediction performance may be constrained. Accordingly, the multi-resolution CRN is developed by capturing the multi-resolution encodings from different convolutional layers and feeding them into different recurrent layers for decoding and prediction in each time step. The temporal dependency with different lengths is learned in multi-resolution CRN.
The experiments on language modeling and action recognition are conducted to investigate the performance of different variants of convolutional recurrent networks. We show the merits of the proposed methods by comparing with RNN and TCN under different experimental settings.
1 Introduction 1
1.1 Background 1
1.2 Motivation 2
1.3 Outline 3
2 Deep Learning 5
2.1 Deep Neural Network 5
2.1.1 Error backpropagation 8
2.1.2 Stochastic gradient descent 8
2.2 Recurrent Neural Network 10
2.2.1 Backpropagation through time 11
2.2.2 Long short-term memory 12
2.3 Convolutional Neural Network 13
2.3.1 Convolutional layer 13
2.3.2 Pooling layer 14
2.3.3 Temporal Convolutional Network 15
2.4 Variational Autoencoder 16
2.4.1 Autoencoder 16
2.4.2 Variational inference 17
2.4.3 Variational autoencoder 19
2.4.4 Ladder variational autoencoder 24
3 Deterministic Autoregressive Models 27
3.1 Dilated RNN 27
3.2 CLDNN 29
4 Stochastic Autoregressive Models 31
4.1 Variational Recurrent Neural Network 31
4.1.1 Network architecture 32
4.1.2 Optimization procedure 34
4.2 Stochastic Wavenet 34
4.2.1 Network architecture 35
4.2.2 Optimization procedure 37
4.3 Stochastic Temporal Convolutional Network 38
4.3.1 Network architecture 38
4.3.2 Optimization procedure 40
5 Stochastic Convolutional Recurrent Network 42
5.1 Convolutional Recurrent Network 42
5.2 Stochastic Convolutional Recurrent Network 44
5.3 Multi-resolution Convolutional Recurrent Network 48
5.4 Generalization 51
5.5 Comparison 53
6 Experiments 56
6.1 Language Modeling 56
6.2 Action Recognition 61
7 Conclusion and Future Works 63
7.1 Conclusions 63
7.2 Future Works 64
[1] J. L. Elman, "Finding structure in time," Cognitive Science, vol. 14, no. 2, pp. 179-211, 1990.
[2] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[3] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using rnn encoderdecoder
for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
[4] Y. LeCun, Y. Bengio, et al., "Convolutional networks for images, speech, and time series," The handbook of brain theory and neural networks, vol. 3361, no. 10,
p. 1995, 1995.
[5] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "WaveNet: A generative model
for raw audio," arXiv preprint arXiv:1609.03499, 2016.
[6] S. Bai, J. Z. Kolter, and V. Koltun, "An empirical evaluation of generic convolutional and recurrent networks for sequence modeling," arXiv preprint
arXiv:1803.01271, 2018.
[7] J. Chung, K. Kastner, L. Dinh, K. Goel, A. C. Courville, and Y. Bengio, "A recurrent latent variable model for sequential data," in Advances in neural in-
formation processing systems, pp. 2980-2988, 2015.
[8] E. Aksan and O. Hilliges, "Stcn: Stochastic temporal convolutional networks," arXiv preprint arXiv:1902.06568, 2019.
[9] X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectier neural networks," in Proc. of the Fourteenth International Conference on Articial Intelligence and
Statistics, pp. 315-323, 2011.
[10] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation," Technical Report,DTIC Document, 1985.
[11] H. Robbins and S. Monro, "A stochastic approximation method," The annals of mathematical statistics, pp. 400-407, 1951.
[12] J. Kiefer, J.Wolfowitz, et al., "Stochastic estimation of the maximum of a regression function," The Annals of Mathematical Statistics, vol. 23, no. 3, pp. 462-466, 1952.
[13] N. Qian, "On the momentum term in gradient descent learning algorithms," Neural Networks, vol. 12, no. 1, pp. 145-151, 1999.
[14] R. J. Williams and D. Zipser, "A learning algorithm for continually running fully recurrent neural networks," Neural computation, vol. 1, no. 2, pp. 270-280, 1989.
[15] T. Catfolis, "A method for improving the real-time recurrent learning algorithm," Neural Networks, vol. 6, no. 6, pp. 807-821, 1993.
[16] K. Funahashi and Y. Nakamura, "Approximation of dynamical systems by continuous time recurrent neural networks," Neural networks, vol. 6, no. 6, pp. 801- 806, 1993.
[17] R. J. Williams and D. Zipser, "Gradient-based learning algorithms for recurrent networks and their computational complexity," Back-Propagation: Theory,
Architectures and Applications, pp. 433-486, 1995.
[18] M. Boden, "A guide to recurrent neural networks and backpropagation," 2001.
[19] Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is dicult," IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157-166, 1994.
[20] F. A. Gers, N. N. Schraudolph, and J. Schmidhuber, "Learning precise timing with LSTM recurrent networks," The Journal of Machine Learning Research, vol. 3, pp. 115-143, 2003.
[21] A. Graves, A. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," in Proc. of IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pp. 6645-6649, 2013.
[22] F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," in Proc. of International Conference on Learning Representations, 2016.
[23] C.-Y. Liou, W.-C. Cheng, J.-W. Liou, and D.-R. Liou, "Autoencoder for words," Neurocomputing, vol. 139, pp. 84-96, 2014.
[24] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng, "Multimodal deep learning," in Proceedings of the 28th international conference on machine learning (ICML-11), pp. 689-696, 2011.
[25] M. J. Beal, Variational Algorithms for Approximate Bayesian Inference. PhD thesis, University of London, 2003.
[26] M. J.Wainwright and M. I. Jordan, "Graphical models, exponential families, and variational inference," Foundations and Trends R in Machine Learning, vol. 1,
no. 1{2, pp. 1{305, 2007.
[27] D. P. Kingma and M. Welling, "Auto-encoding variational Bayes," in Proc. of International Conference on Learning Representations, 2014.
[28] D. J. Rezende, S. Mohamed, and D. Wierstra, "Stochastic backpropagation and approximate inference in deep generative models," in Proc. of The 31st International Conference on Machine Learning, pp. 1278-1286, 2014.
[29] C. K. Snderby, T. Raiko, L. Maale, S. K. Snderby, and O. Winther, "Ladder variational autoencoders," in Proc. of Advances in Neural Information Processing
Systems, pp. 3738-3746, 2016.
[30] S. Chang, Y. Zhang, W. Han, M. Yu, X. Guo, W. Tan, X. Cui, M. Witbrock, M. A. Hasegawa-Johnson, and T. S. Huang, "Dilated recurrent neural networks," in Advances in Neural Information Processing Systems, pp. 77-87, 2017.
[31] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak, "Convolutional, long shortterm memory, fully connected deep neural networks," in 2015 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4580-4584, IEEE, 2015.
[32] G. Lai, B. Li, G. Zheng, and Y. Yang, "Stochastic wavenet: A generative latent variable model for sequential data," arXiv preprint arXiv:1806.06116, 2018.
[33] M. Liang and X. Hu, "Recurrent convolutional neural network for object recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3367-3375, 2015.
[34] R. Cui, H. Liu, and C. Zhang, "Recurrent convolutional neural networks for continuous sign language recognition by staged optimization," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7361-7369, 2017.
[35] N. McLaughlin, J. Martinez del Rincon, and P. Miller, "Recurrent convolutional network for video-based person re-identication," in Proceedings of the IEEE
conference on computer vision and pattern recognition, pp. 1325-1334, 2016.
[36] X. Wang, W. Jiang, and Z. Luo, "Combination of convolutional and recurrent neural network for sentiment analysis of short texts," in Proceedings of COLING
2016, the 26th international conference on computational linguistics: Technical papers, pp. 2428-2437, 2016.
[37] X. Zhang, F. Chen, and R. Huang, "A combination of rnn and cnn for attentionbased relation classication," Procedia computer science, vol. 131, pp. 911-917, 2018.
[38] J. Koutnik, K. Gre, F. Gomez, and J. Schmidhuber, "A clockwork rnn," arXiv preprint arXiv:1402.3511, 2014.
[39] D. Neil, M. Pfeier, and S.-C. Liu, "Phased lstm: Accelerating recurrent network training for long or event-based sequences," in Advances in neural information
processing systems, pp. 3882-3890, 2016.
[40] J. Chung, S. Ahn, and Y. Bengio, "Hierarchical multiscale recurrent neural networks," arXiv preprint arXiv:1609.01704, 2016.
[41] F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122, 2015.
[42] G. Lai, W.-C. Chang, Y. Yang, and H. Liu, "Modeling long-and short-term temporal patterns with deep neural networks," in The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 95- 104, ACM, 2018.
[43] V. Campos, B. Jou, X. Giro-i Nieto, J. Torres, and S.-F. Chang, "Skip rnn: Learning to skip state updates in recurrent neural networks," arXiv preprint arXiv:1708.06834, 2017.
[44] T. Mikolov and G. Zweig, "Context dependent recurrent neural network language model," in IEEE Workshop on Spoken Language Technology, pp. 234-239, 2012.
[45] C. Szegedy, S. Ioe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, inceptionresnet and the impact of residual connections on learning," in Thirty-First AAAI Conference on Articial Intelligence, 2017.
[46] K. Soomro, A. R. Zamir, and M. Shah, " Ucf101: A dataset of 101 human actions classes from videos in the wild," arXiv preprint arXiv:1212.0402, 2012.
[47] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊