跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.240) 您好!臺灣時間:2026/06/13 11:21
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:楊芷璇
研究生(外文):Chih-Hsuan Yang
論文名稱:基於注意力之英中對譯係統
論文名稱(外文):English Chinese Translation System Using Attention Model
指導教授:王家慶
指導教授(外文):Jia-Ching Wang
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:75
中文關鍵詞:機器翻譯神經機器翻譯深度學習自然語言處理
外文關鍵詞:Machine translationNeural machine translationDeep learningNatural language processing
相關次數:
  • 被引用被引用:0
  • 點閱點閱:460
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
深度類神經網路(Deep neural network)在自然語言處理的領域中有驚人的效果,機器翻譯是自然語言處理中一個重要項目,主要依賴於兩種類神經網路架構,卷積神經網路(Convolutional neural network)與遞迴神經網路(Recurrent neural network),但機器翻譯的結果好壞取決於翻譯語言的詞彙、文法結構,往往深度模型翻譯出來的句子都存在著文法不通順、雙語語句或詞彙無法對齊等問題,而近年來Google團隊提出不使用卷積神經網路與遞迴類神經網路的注意力模型—Transformer,只使用編碼器與解碼器模型加上注意力(Attention)機制在機器翻譯上就有顯著結果,本論文提出的架構就是以Transformer為基底模型,模型由多層式的編碼器與解碼器組成,利用多頭注意力機制,將翻譯的來源語言句子與目標語言句子進行相似度配對,來對齊兩語言各個詞彙,本論文的目標為提升翻譯模型的質量,使翻譯的結果更加精進,所以本論文提出的架構對transformer做了改進,將殘差與密集連接運用於transformer中,避免在模型計算注意力時,因為多層的傳遞造成資訊遺失,因此將前層的資訊與後層做連接,藉此優化訓練的模型。最後在實驗上,會將提出的方法與transformer原始模型應用於英中翻譯系統,並使用機器翻譯評估方法BLEU與WER進行比較,結果顯示提出注意力模型的翻譯效果比transformer模型好。
Deep neural network (DNN) has performed impressively in the natural language processing. Machine Translation is one of the important project in natural language processing. It depends on two kinds of neural network architectures, convolutional neural network (CNN) and recurrent neural network(RNN). But the result of machine translation is based on the vocabulary and grammatical structure, the sentences translated by the deep learning model may cause some problems such as grammar errors and bilingual vocabulary misalignment. In recent years, the Google team propose attention model--transformer which does not use convolutional neural network and recurrent neural network, and get significant result by using the attention mechanism on encoder and decoder. The architecture proposed in this paper is based on transformer. The model consists of multilayer encoder and decoder. Using multi-head attention to match the sentence of source language with the sentence of target language and align the vocabulary of two languages. The goal in this paper is to improve the quality of translation results, so we propose the architecture which applies residual and dense connection on transformer to avoid information loss. Therefore, back layer is connected with the previous layer to optimize the model. Finally, we will apply proposed architecture and baseline model on English-Chinese translation system in the experiment, and use BLEU and WER to compared two translate sentence. And the translation result of proposed attention architecture is better than baseline model.
中文摘要...................................................I
ABSTRACT.................................................II
章節目次.................................................III
圖目錄....................................................V
表目錄..................................................VII
第一章 緒論...............................................1
1.1 研究背景及研究目的....................................1
1.2 研究方法與章節概要....................................2
第二章 文獻探討...........................................3
2.1 類神經網路............................................3
2.1.1 感知機架構..........................................4
2.1.2 多層感知機..........................................5
2.2 深度神經網路..........................................9
2.2.1 卷積神經網路.......................................11
2.2.2 遞迴神經網路.......................................20
2.3 機器翻譯.............................................23
2.3.1 傳統機器翻譯.......................................23
2.3.2 基於編碼器-解碼器之神經機器翻譯......................24
2.3.3 基於序列到序列之神經機器翻譯........................25
2.3.4 基於注意力模型之神經機器翻譯........................27
2.3.5 基於自注意力模型之自然語言處理.......................32
2.3.6 基於卷積神經網路之神經機器翻譯.......................34
第三章 深度殘差密集注意力神經網路..........................37
3.1 TRANSFORMER.........................................37
3.2 DEEP DENSE ATTENTION-BASED NEURAL NETWORK...........45
3.3 DEEP RESIDUAL ATTENTION-BASED NEURAL NETWORK........48
3.4 DEEP RESIDUAL DENSE ATTENTION-BASED NEURAL NETWORK..51
第四章 實驗結果..........................................52
4.1 實驗環境與注意力神經網路設置..........................52
4.2 翻譯資料的前處理方式..................................53
4.3 機器翻譯評估準則.....................................53
4.4 實驗結果.............................................56
第五章 結論與未來研究方向.................................59
第六章 參考文獻..........................................60
[1] I. Sutskever, O. Vinyals, Q. V. Le, “Sequence to sequence learning with neural networks.” Advances in neural information processing systems. 2014.
[2] J. Gehring, M. Auli, D. Grangier, Y. N. Dauphin, “A convolutional encoder model for neural machine translation.” arXiv preprint arXiv:1611.02344 (2016).
[3] Y. Cheng, et al. "Agreement-based joint training for bidirectional attention-based neural machine translation." arXiv preprint arXiv:1512.04650 (2015).
[4] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain,” Psychological Review, Vol 65(6), Nov 1958, 386-408.
[5] R. O’Reilly, “Biologically Plausible Error-driven Learning using Local Activation Differences: The Generalized Recirculation Algorithm,” Neural Computation, 8:5, 895-938, 1996.
[6] D. Rumelhart, G. Hinton, R. Williams, “Learning Internal Representations by Error Propagation” Technical rept., Mar-Sep, 1985.
[7] 深度學習:使用激勵函數的目的、如何選擇激勵函數.[Online]. Available : http://mropengate.blogspot.tw/2017/02/deep-learning-role-of-activation.html, [Accessed: 10-Jun-2018].
[8] G. Hinton, S. Osindero, Y. Teh, “A Fast Learning Algorithm for Deep Belief Nets” Neural computation, Vol. 18, No. 7, Pages 1527-1554, 2006.
[9] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov 1998.
[10] C. Ding and D. Tao, “Robust face recognition via multimodal deep face representation,” IEEE Transactions on Multimedia, vol. 17, no. 11, pp. 2049-2058, 2015.
[11] L. Pigou, S. Dieleman, P. Kindermans, and B. Schrauwen, “Sign language recognition using convolutional neural networks,” Workshop at the European Conference on Computer Vision, Springer International Publishing, 2014.
[12] S. Ren, K. He, R. Girshick, J. Sun. “Faster r-cnn: Towards real-time object detection with region proposal networks.” Advances in neural information processing systems. 2015.
[13] S. Sukittanon, A. Surendran, J. Platt, and C. Burges, “Convolutional networks for speech detection,” Interspeech, 2004.
[14] Y. Kim, “Convolutional neural networks for sentence classification.” arXiv preprint arXiv:1408.5882 (2014).
[15] C dos Santos, and M. Gatti, “Deep convolutional neural networks for sentiment analysis of short texts.” Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 2014.
[16] X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification.” Advances in neural information processing systems. 2015.
[17] J. Gehring, M. Auli, D. Grangier, D. Yarats, Y. N. Dauphin, “Convolutional sequence to sequence learning.” arXiv preprint arXiv:1705.03122 (2017).
[18] K. He, X. Zhang, S. Ren, J. Sun, “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[19] G. Huang, Z. Liu, K. Q. Weinberger, L. van der Maaten, “Densely connected convolutional networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. Vol. 1. No. 2. 2017.
[20] Y. Wang, F. Tian, “Recurrent residual learning for sequence classification.” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016.
[21] F. Godin, J. Dambre, W. De Neve, “Improving Language Modeling using Densely Connected Recurrent Neural Networks.” arXiv preprint arXiv:1707.06130. 2017.
[22] S. Ioffe, C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” arXiv preprint arXiv:1502.03167 (2015).
[23] K. Simonyan, Z. Andrew, “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014).
[24] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities”, Proceedings of the National Academy of Sciences of the USA, vol. 79, no. 8,pp. 2554–2558, April 1982.
[25] S. Hochreiter and J. Schmidhuber. “Long short-term memory”. Neural Computation, vol. 9, pp. 1735–1780, 1997.
[26] Deep Learning in a Nutshell: Sequence Learning,[Online]. Available: https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-sequence-learning/. [Accessed: 10-Jun-2018].
[27] K. Cho, B. Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation.” arXiv preprint arXiv:1406.1078 (2014).
[28] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T, Darrell, “Long-term recurrent convolutional networks for visual recognition and description.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
[29] L. Pigou, A. Oord, S. Dieleman, M. Herreweghe, and J. Dambre, “Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video,” arXiv preprint arXiv:1506.01911, 2015.
[30] A. Graves, and J. Navdeep, “Towards end-to-end speech recognition with recurrent neural networks.” International Conference on Machine Learning. 2014.
[31] P. Liu, X. Qiu, X. Huang, “Recurrent neural network for text classification with multi-task learning.” arXiv preprint arXiv:1605.05101 (2016).
[32] L. Shang, Z. Lu, H. Li, “Neural responding machine for short-text conversation.” arXiv preprint arXiv:1503.02364 (2015).
[33] R. Nallapati, B. Zhou, C. Gulcehre, B. Xiang, “Abstractive text summarization using sequence-to-sequence rnns and beyond.” arXiv preprint arXiv:1602.06023 (2016).
[34] S. Lai, L. Xu, K. Liu, J. Zhao, “Recurrent Convolutional Neural Networks for Text Classification.” AAAI. Vol. 333. 2015.
[35] D. Wang, and E. Nyberg. “A long short-term memory model for answer sentence selection in question answering.” Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Vol. 2. 2015.
[36] A. Severyn, A. Moschitti. “Learning to rank short text pairs with convolutional deep neural networks.” Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2015.
[37] N. Kalchbrenner, P. Blunsom, “Recurrent continuous translation models.” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013.
[38] D. Bahdanau, K. Cho, Y. Bengio, “Neural machine translation by jointly learning to align and translate.” arXiv preprint arXiv:1409.0473 (2014).
[39] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention.” International Conference on Machine Learning. 2015.
[40] M. T. Luong, H. Pham, C. D. Manning, “Effective approaches to attention-based neural machine translation.” arXiv preprint arXiv:1508.04025 (2015).
[41] Vaswani, Ashish, et al. “Attention is all you need.” Advances in Neural Information Processing Systems. 2017.
[42] Z. Lin, M. Feng, C. N. Santos, M. Yu, B. Xiang, B. Zhou, Y. Bengio, “A structured self-attentive sentence embedding.” arXiv preprint arXiv:1703.03130 (2017).
[43] R. Paulus, C. Xiong, and R. Socher, “A deep reinforced model for abstractive summarization.” arXiv preprint arXiv:1705.04304 (2017).
[44] J. Cheng, L. Dong, and M. Lapata, Long short-term memory-networks for machine reading. In Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2016.
[45] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, et al. “Google’s neural machine translation system: Bridging the gap between human and machine translation.” arXiv preprint arXiv:1609.08144, 2016.
[46] O. Press, L. Wolf. “Using the output embedding to improve language models.” arXiv preprint arXiv:1608.05859(2016).
[47] H. Inan, K. Khosravi, R. Socher, “Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling.” ArXiv Preprint arXiv: 1611.01462.
[48] K. Papineni, S. Roukos, T. Ward, W. J. Zhu, “BLEU: a method for automatic evaluation of machine translation.” In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311-318). Association for Computational Linguistics. 2002.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊