臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.240) 您好！臺灣時間：2026/06/13 11:21

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

楊芷璇

研究生(外文):

Chih-Hsuan Yang

論文名稱:

基於注意力之英中對譯係統

論文名稱(外文):

English Chinese Translation System Using Attention Model

指導教授:

王家慶

指導教授(外文):

Jia-Ching Wang

學位類別:

碩士

校院名稱:

國立中央大學

系所名稱:

資訊工程學系

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2018

畢業學年度:

106

語文別:

中文

論文頁數:

中文關鍵詞:

機器翻譯、神經機器翻譯、深度學習、自然語言處理

外文關鍵詞:

Machine translation、Neural machine translation、Deep learning、Natural language processing

相關次數:

被引用:0
點閱:460
評分:
下載:0
書目收藏:0

深度類神經網路(Deep neural network)在自然語言處理的領域中有驚人的效果，機器翻譯是自然語言處理中一個重要項目，主要依賴於兩種類神經網路架構，卷積神經網路(Convolutional neural network)與遞迴神經網路(Recurrent neural network)，但機器翻譯的結果好壞取決於翻譯語言的詞彙、文法結構，往往深度模型翻譯出來的句子都存在著文法不通順、雙語語句或詞彙無法對齊等問題，而近年來Google團隊提出不使用卷積神經網路與遞迴類神經網路的注意力模型—Transformer，只使用編碼器與解碼器模型加上注意力(Attention)機制在機器翻譯上就有顯著結果，本論文提出的架構就是以Transformer為基底模型，模型由多層式的編碼器與解碼器組成，利用多頭注意力機制，將翻譯的來源語言句子與目標語言句子進行相似度配對，來對齊兩語言各個詞彙，本論文的目標為提升翻譯模型的質量，使翻譯的結果更加精進，所以本論文提出的架構對transformer做了改進，將殘差與密集連接運用於transformer中，避免在模型計算注意力時，因為多層的傳遞造成資訊遺失，因此將前層的資訊與後層做連接，藉此優化訓練的模型。最後在實驗上，會將提出的方法與transformer原始模型應用於英中翻譯系統，並使用機器翻譯評估方法BLEU與WER進行比較，結果顯示提出注意力模型的翻譯效果比transformer模型好。

Deep neural network (DNN) has performed impressively in the natural language processing. Machine Translation is one of the important project in natural language processing. It depends on two kinds of neural network architectures, convolutional neural network (CNN) and recurrent neural network(RNN). But the result of machine translation is based on the vocabulary and grammatical structure, the sentences translated by the deep learning model may cause some problems such as grammar errors and bilingual vocabulary misalignment. In recent years, the Google team propose attention model--transformer which does not use convolutional neural network and recurrent neural network, and get significant result by using the attention mechanism on encoder and decoder. The architecture proposed in this paper is based on transformer. The model consists of multilayer encoder and decoder. Using multi-head attention to match the sentence of source language with the sentence of target language and align the vocabulary of two languages. The goal in this paper is to improve the quality of translation results, so we propose the architecture which applies residual and dense connection on transformer to avoid information loss. Therefore, back layer is connected with the previous layer to optimize the model. Finally, we will apply proposed architecture and baseline model on English-Chinese translation system in the experiment, and use BLEU and WER to compared two translate sentence. And the translation result of proposed attention architecture is better than baseline model.

中文摘要...................................................I
ABSTRACT.................................................II
章節目次.................................................III
圖目錄....................................................V
表目錄..................................................VII
第一章緒論...............................................1
1.1 研究背景及研究目的....................................1
1.2 研究方法與章節概要....................................2
第二章文獻探討...........................................3
2.1 類神經網路............................................3
2.1.1 感知機架構..........................................4
2.1.2 多層感知機..........................................5
2.2 深度神經網路..........................................9
2.2.1 卷積神經網路.......................................11
2.2.2 遞迴神經網路.......................................20
2.3 機器翻譯.............................................23
2.3.1 傳統機器翻譯.......................................23
2.3.2 基於編碼器-解碼器之神經機器翻譯......................24
2.3.3 基於序列到序列之神經機器翻譯........................25
2.3.4 基於注意力模型之神經機器翻譯........................27
2.3.5 基於自注意力模型之自然語言處理.......................32
2.3.6 基於卷積神經網路之神經機器翻譯.......................34
第三章深度殘差密集注意力神經網路..........................37
3.1 TRANSFORMER.........................................37
3.2 DEEP DENSE ATTENTION-BASED NEURAL NETWORK...........45
3.3 DEEP RESIDUAL ATTENTION-BASED NEURAL NETWORK........48
3.4 DEEP RESIDUAL DENSE ATTENTION-BASED NEURAL NETWORK..51
第四章實驗結果..........................................52
4.1 實驗環境與注意力神經網路設置..........................52
4.2 翻譯資料的前處理方式..................................53
4.3 機器翻譯評估準則.....................................53
4.4 實驗結果.............................................56
第五章結論與未來研究方向.................................59
第六章參考文獻..........................................60

[1] I. Sutskever, O. Vinyals, Q. V. Le, “Sequence to sequence learning with neural networks.” Advances in neural information processing systems. 2014.
[2] J. Gehring, M. Auli, D. Grangier, Y. N. Dauphin, “A convolutional encoder model for neural machine translation.” arXiv preprint arXiv:1611.02344 (2016).
[3] Y. Cheng, et al. "Agreement-based joint training for bidirectional attention-based neural machine translation." arXiv preprint arXiv:1512.04650 (2015).
[4] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain,” Psychological Review, Vol 65(6), Nov 1958, 386-408.
[5] R. O’Reilly, “Biologically Plausible Error-driven Learning using Local Activation Differences: The Generalized Recirculation Algorithm,” Neural Computation, 8:5, 895-938, 1996.
[6] D. Rumelhart, G. Hinton, R. Williams, “Learning Internal Representations by Error Propagation” Technical rept., Mar-Sep, 1985.
[7] 深度學習：使用激勵函數的目的、如何選擇激勵函數.[Online]. Available : http://mropengate.blogspot.tw/2017/02/deep-learning-role-of-activation.html, [Accessed: 10-Jun-2018].
[8] G. Hinton, S. Osindero, Y. Teh, “A Fast Learning Algorithm for Deep Belief Nets” Neural computation, Vol. 18, No. 7, Pages 1527-1554, 2006.
[9] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov 1998.
[10] C. Ding and D. Tao, “Robust face recognition via multimodal deep face representation,” IEEE Transactions on Multimedia, vol. 17, no. 11, pp. 2049-2058, 2015.
[11] L. Pigou, S. Dieleman, P. Kindermans, and B. Schrauwen, “Sign language recognition using convolutional neural networks,” Workshop at the European Conference on Computer Vision, Springer International Publishing, 2014.
[12] S. Ren, K. He, R. Girshick, J. Sun. “Faster r-cnn: Towards real-time object detection with region proposal networks.” Advances in neural information processing systems. 2015.
[13] S. Sukittanon, A. Surendran, J. Platt, and C. Burges, “Convolutional networks for speech detection,” Interspeech, 2004.
[14] Y. Kim, “Convolutional neural networks for sentence classification.” arXiv preprint arXiv:1408.5882 (2014).
[15] C dos Santos, and M. Gatti, “Deep convolutional neural networks for sentiment analysis of short texts.” Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 2014.
[16] X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification.” Advances in neural information processing systems. 2015.
[17] J. Gehring, M. Auli, D. Grangier, D. Yarats, Y. N. Dauphin, “Convolutional sequence to sequence learning.” arXiv preprint arXiv:1705.03122 (2017).
[18] K. He, X. Zhang, S. Ren, J. Sun, “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[19] G. Huang, Z. Liu, K. Q. Weinberger, L. van der Maaten, “Densely connected convolutional networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. Vol. 1. No. 2. 2017.
[20] Y. Wang, F. Tian, “Recurrent residual learning for sequence classification.” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016.
[21] F. Godin, J. Dambre, W. De Neve, “Improving Language Modeling using Densely Connected Recurrent Neural Networks.” arXiv preprint arXiv:1707.06130. 2017.
[22] S. Ioffe, C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” arXiv preprint arXiv:1502.03167 (2015).
[23] K. Simonyan, Z. Andrew, “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014).
[24] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities”, Proceedings of the National Academy of Sciences of the USA, vol. 79, no. 8,pp. 2554–2558, April 1982.
[25] S. Hochreiter and J. Schmidhuber. “Long short-term memory”. Neural Computation, vol. 9, pp. 1735–1780, 1997.
[26] Deep Learning in a Nutshell: Sequence Learning,[Online]. Available: https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-sequence-learning/. [Accessed: 10-Jun-2018].
[27] K. Cho, B. Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation.” arXiv preprint arXiv:1406.1078 (2014).
[28] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T, Darrell, “Long-term recurrent convolutional networks for visual recognition and description.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
[29] L. Pigou, A. Oord, S. Dieleman, M. Herreweghe, and J. Dambre, “Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video,” arXiv preprint arXiv:1506.01911, 2015.
[30] A. Graves, and J. Navdeep, “Towards end-to-end speech recognition with recurrent neural networks.” International Conference on Machine Learning. 2014.
[31] P. Liu, X. Qiu, X. Huang, “Recurrent neural network for text classification with multi-task learning.” arXiv preprint arXiv:1605.05101 (2016).
[32] L. Shang, Z. Lu, H. Li, “Neural responding machine for short-text conversation.” arXiv preprint arXiv:1503.02364 (2015).
[33] R. Nallapati, B. Zhou, C. Gulcehre, B. Xiang, “Abstractive text summarization using sequence-to-sequence rnns and beyond.” arXiv preprint arXiv:1602.06023 (2016).
[34] S. Lai, L. Xu, K. Liu, J. Zhao, “Recurrent Convolutional Neural Networks for Text Classification.” AAAI. Vol. 333. 2015.
[35] D. Wang, and E. Nyberg. “A long short-term memory model for answer sentence selection in question answering.” Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Vol. 2. 2015.
[36] A. Severyn, A. Moschitti. “Learning to rank short text pairs with convolutional deep neural networks.” Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2015.
[37] N. Kalchbrenner, P. Blunsom, “Recurrent continuous translation models.” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013.
[38] D. Bahdanau, K. Cho, Y. Bengio, “Neural machine translation by jointly learning to align and translate.” arXiv preprint arXiv:1409.0473 (2014).
[39] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention.” International Conference on Machine Learning. 2015.
[40] M. T. Luong, H. Pham, C. D. Manning, “Effective approaches to attention-based neural machine translation.” arXiv preprint arXiv:1508.04025 (2015).
[41] Vaswani, Ashish, et al. “Attention is all you need.” Advances in Neural Information Processing Systems. 2017.
[42] Z. Lin, M. Feng, C. N. Santos, M. Yu, B. Xiang, B. Zhou, Y. Bengio, “A structured self-attentive sentence embedding.” arXiv preprint arXiv:1703.03130 (2017).
[43] R. Paulus, C. Xiong, and R. Socher, “A deep reinforced model for abstractive summarization.” arXiv preprint arXiv:1705.04304 (2017).
[44] J. Cheng, L. Dong, and M. Lapata, Long short-term memory-networks for machine reading. In Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2016.
[45] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, et al. “Google’s neural machine translation system: Bridging the gap between human and machine translation.” arXiv preprint arXiv:1609.08144, 2016.
[46] O. Press, L. Wolf. “Using the output embedding to improve language models.” arXiv preprint arXiv:1608.05859(2016).
[47] H. Inan, K. Khosravi, R. Socher, “Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling.” ArXiv Preprint arXiv: 1611.01462.
[48] K. Papineni, S. Roukos, T. Ward, W. J. Zhu, “BLEU: a method for automatic evaluation of machine translation.” In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311-318). Association for Computational Linguistics. 2002.

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	電腦輔助試題翻譯：以國際數學與科學教育成就趨勢調查為例
2.	英文技術文獻中動詞與其受詞之中文翻譯的語境效用
3.	以範例為基礎之英漢TIMSS試題輔助翻譯
4.	語彙知識之擷取與混合式機器翻譯系統之研究
5.	假新聞的文字分析與辨識
6.	應用深度學習探索網路輿論與股價變動相關性之研究—以PTT為例
7.	利用深度嵌入向量模型的非監督式文字分群方法
8.	整合字元圖像與倉頡特徵的中文詞向量生成
9.	基於歌詞與歌曲音訊特徵之熱門歌曲預測
10.	類神經網路在行銷主軸與產品文案應用
11.	以不一致性損失函數結合抽取式和生成式摘要的融合摘要模型
12.	以雙向長短期記憶網路架構混和多時間粒度文字模態改善婚姻治療自動化行為評分系統
13.	MINION: 透過融合詞彙生成之語境情感特徵改進情緒偵測模型
14.	利用深度遷移學習處理跨語言文本分類問題
15.	利用深度學習之笑話辨識與生成

無相關期刊

1.	喚醒詞系統之嵌入式系統實現
2.	強健性喚醒詞辨認之嵌入式系統實作
3.	基於到達時間差的圓陣聲源跟踪
4.	機率型潛在變數模型於資料表示法學習
5.	骨折區檢測
6.	運用深度學習方法預測癌症種類及存活死亡與治癒復發
7.	基於多尺度注意力模型之物件偵測
8.	基於深度學習之殘響消除
9.	基於卷積神經網路之影像去糢糊方法
10.	Sanders CT採用PCANet和數據增強技術對跟骨骨折圖像進行分類
11.	單負源分離與非負矩陣分解和深度學習
12.	基於風格向量空間之個性化協同過濾服裝推薦系統
13.	複數矩陣分解法及其應用
14.	強健性音訊處理研究:從訊號增強到模型學習
15.	快速-長短期記憶聲學模型於遠距語音辨識及喚醒關鍵字任務

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室