(3.238.186.43) 您好!臺灣時間:2021/02/26 12:47
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:黃文傑
論文名稱(外文):An Emotional Dialogue System Using Conditional Generative Adversarial Networks Based on Sequence-to-Sequence with the Transformer Encoder
指導教授:薛幼苓
指導教授(外文):Yuling Hsueh
口試委員:薛幼苓張榮貴曾若涵
口試委員(外文):Yuling Hsueh
口試日期:2020-07-28
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:50
中文關鍵詞:中文對話生成情緒商數自然語言處理深度學習門控循環單元序列到序列模型注意力機制變壓器模型條件對 抗生成網絡條件變分自動編碼器
外文關鍵詞:Chinese conversation generationEmotional intelligenceNativ ural language processingDeep learningGated recurrent unitsSequence-tosequenceAttention mechanismTransformer modelConditional adversarial generative NetsConditional variational autoencoder
相關次數:
  • 被引用被引用:0
  • 點閱點閱:40
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
理解情緒的表達並產生適當的回覆是建構情緒對話代理人的關鍵步
驟。在本篇論文中,我們提出了一個單輪情緒對話生成的架構,並且
我們的模型包含了三個主要部分,分別是運用堆疊編碼器(Encoders)
的序列到序列模型(Sequence-to-sequence model)、條件變分自動編碼器
(Conditional variational autoencoder) 以及條件生成對抗網絡(Conditional
generative adversarial nets)。我們的方法達成三項關鍵技術的突破。首
先,在運用堆疊編碼器的序列到序列模型的部分,我們透過將變壓器
(Transformer) 與基於門控循環單元(Gated recurrent units) 的遞歸神經網
路(Recurrent neural network) 相結合來設計雙層編碼器。其次,由於序
列到序列模型的靈活性,我們採用了條件變分自動編碼器在我們的架
構上,該條件變分自動編碼器使用潛在變量來學習潛在對話意圖的分
布並生成多元的回覆。最後,我們將基於條件變分自動編碼器的序列
到序列模型作為生成模型,並且透過內容識別器與情感分類器來協助
生成模型的訓練,這有助於我們的模型促進內容訊息與情感表達。我
們採用了NTCIR 短文本對話任務(STC-3) 中文情感對話生成(CECG)
子任務資料集[1],並且使用自動評估與人為評估來比較我們的模型以
及基準模型。實驗結果表明,我們提出的架構可以生成語意上合理且
情緒適當的回覆。

Understanding the expression of the emotion and generating appropriate
responses are key steps toward constructing emotional, conversational agents.
In this paper, we propose a framework for single-turn emotional conversation
generation, and there are three main components in our model, namely, a
sequence-to-sequence model with stacked encoders, conditional variational
autoencoder, and conditional generative adversarial networks. Our approach
achieves three key contributions. First, for the sequence-to-sequence model
with stacked encoders, we design a two-layer encoders by combining Transformer
with gated recurrent units-based neural networks. Second, because of
the flexibility of s sequence-to-sequence model, we adopt a conditional variational
autoencoder on our framework, which uses latent variables to learn a
distribution over potential responses and generates diverse responses. Third,
we regard a conditional variational autoencoder-based, sequence-to-sequence
model as the generative model, and the training of the generative model is assisted
by both a content discriminator and an emotion classifier, which assists
our model to promote content information and emotion expression. We use
automated evaluation and human evaluation to evaluate our model and baselines
on NTCIR short text conversation task (STC-3) Chinese emotional conversation
generation (CECG) Subtask dataset[1], and the experimental results
demonstrate that our proposed framework can generate semantically reasonable
and emotion appropriate responses.

致謝ii
摘要iii
Abstract iv
1 Introduction 1
2 Related Work 4
3 Background 7
3.1 GRU-based Sequence-to-Sequence Model . . . . . . . . . . . . . . . . . 7
3.2 Attention Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Transformer-based Sequence-to-Sequence Model . . . . . . . . . . . . . 9
3.4 Conditional Variational Autoencoder . . . . . . . . . . . . . . . . . . . . 10
3.5 Generative Adversarial Nets . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Model Architecture 14
4.1 Emotional Sequence-to-Sequence Model with Stacked Encoders . . . . . 15
4.2 Conditional Variational Autoencoder for Dialogue Generation . . . . . . 17
4.3 Conditional Generative Adversarial Nets for Dialogue Generation . . . . 18
4.3.1 CGAN Framework . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3.2 Adversarial Training for the Generative Model . . . . . . . . . . 20
5 Experiments 23
5.1 Dataset and Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2.1 Seq2Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2.2 CVAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2.3 CGAN-CVAE . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.1 Perplexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.2 Accuracy of Expressed Emotions . . . . . . . . . . . . . . . . . 25
5.3.3 Human Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.4 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4.1 Emotion Classifier for Automated Evaluation . . . . . . . . . . . 26
5.4.2 Our Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
vi
6 Experimental Results and Analysis 29
6.1 Parameters of the Experimental Settings. . . . . . . . . . . . . . . . . . . 29
6.2 Automated Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.3 Human Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7 Conclusions 34
Bibliography 36
Bibliography
[1] H. Zhou, M. Huang, T. Zhang, X. Zhu, and B. Liu, “Emotional chatting machine:
Emotional conversation generation with internal and external memory,” in Proceedings
of AAAI Conference on Artificial Intelligence, pp. 730–738, 2018.
[2] O. Vinyals and Q. Le, “A neural conversational model,” in Proceedings of ICML
Deep Learning Workshop, p. 37, 2015.
[3] J. Li, M. Galley, C. Brockett, J. Gao, and B. Dolan, “A diversity-promoting objective
function for neural conversation models,” in Proceedings of the 2016 Conference
of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, (San Diego, California), pp. 110–119, Association
for Computational Linguistics, June 2016.
[4] R. W. Picard, Affective Computing. Cambridge, MA, USA: MIT Press, 1997.
[5] J. L¨udtke and A. Jacobs, “The emotion potential of simple sentences: additive or
interactive effects of nouns and adjectives?,” Frontiers in Psychology, vol. 6, p. 1137,
2015.
[6] H. Prendinger and M. Ishizuka, “The empathic companion: A character-based interface
that addresses users’ affective states,” Applied Artificial Intelligence, vol. 19,
pp. 267–285, Mar. 2005.
[7] H. Prendinger, J. Mori, and M. Ishizuka, “Using human physiology to evaluate
subtle expressivity of a virtual quizmaster in a mathematical game,” Int. J. Hum.-
Comput. Stud., vol. 62, pp. 231–245, Feb. 2005.
[8] C. Huang, O. Za¨ıane, A. Trabelsi, and N. Dziri, “Automatic dialogue generation with
expressed emotions,” in Proceedings of the 2018 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies,
Volume 2 (Short Papers), (New Orleans, Louisiana), pp. 49–54, Association
for Computational Linguistics, June 2018.
[9] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, u. Kaiser,
and I. Polosukhin, “Attention is all you need,” in Proceedings of the 31st International
Conference on Neural Information Processing Systems, NIPS’17, (Red Hook,
NY, USA), pp. 6000–6010, Curran Associates Inc., 2017.
[10] X. Kong, B. Li, G. Neubig, E. Hovy, and Y. Yang, “An adversarial approach to
high-quality, sentiment-controlled neural dialogue generation,” in Proceedings of
AAAI 2019 Workshop on Reasoning and Learning for Human-Machine Dialogues
(DEEP-DIAL 2019), (Honolulu, Hawaii), January 2019.
[11] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural
networks,” in Proceedings of the 27th International Conference on Neural Information
Processing Systems - Volume 2, NIPS’14, (Cambridge, MA, USA), pp. 3104–
3112, MIT Press, 2014.
[12] J. Li,W. Monroe, A. Ritter, D. Jurafsky, M. Galley, and J. Gao, “Deep reinforcement
learning for dialogue generation,” in Proceedings of the 2016 Conference on Empirical
Methods in Natural Language Processing, (Austin, Texas), pp. 1192–1202,
Association for Computational Linguistics, Nov. 2016.
[13] J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, and D. Jurafsky, “Adversarial learning
for neural dialogue generation,” in Proceedings of the 2017 Conference on Empirical
Methods in Natural Language Processing, (Copenhagen, Denmark), pp. 2157–2169,
Association for Computational Linguistics, Sept. 2017.
[14] A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with pointergenerator
networks,” in Proceedings of the 55th Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers), (Vancouver, Canada),
pp. 1073–1083, Association for Computational Linguistics, July 2017.
[15] J. Gu, Z. Lu, H. Li, and V. O. Li, “Incorporating copying mechanism in sequence-tosequence
learning,” in Proceedings of the 54th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), (Berlin, Germany), pp. 1631–
1640, Association for Computational Linguistics, Aug. 2016.
[16] J. Li, M. Galley, C. Brockett, G. Spithourakis, J. Gao, and B. Dolan, “A personabased
neural conversation model,” in Proceedings of the 54th Annual Meeting of
the Association for Computational Linguistics (Volume 1: Long Papers), (Berlin,
Germany), pp. 994–1003, Association for Computational Linguistics, Aug. 2016.
[17] J. Herzig, M. Shmueli-Scheuer, T. Sandbank, and D. Konopnicki, “Neural response
generation for customer service based on personality traits,” in Proceedings of
the 10th International Conference on Natural Language Generation, (Santiago de
Compostela, Spain), pp. 252–256, Association for Computational Linguistics, Sept.
2017.
[18] B. Reeves and C. Nass, The Media Equation: How People Treat Computers, Television,
and New Media like Real People and Places. USA: Cambridge University
Press, 1996.
[19] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent
neural networks on sequence modeling,” in Proceedings of NIPS 2014 Workshop
on Deep Learning, December 2014, 2014.
[20] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning
to align and translate,” in 3rd International Conference on Learning Representations,
ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
(Y. Bengio and Y. LeCun, eds.), 2015.
[21] T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based
neural machine translation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, (Lisbon, Portugal), pp. 1412–1421, Association
for Computational Linguistics, Sept. 2015.
[22] K. Sohn, X. Yan, and H. Lee, “Learning structured output representation using deep
conditional generative models,” in Proceedings of the 28th International Conference
on Neural Information Processing Systems - Volume 2, NIPS2015, (Cambridge, MA,
USA), pp. 3483–3491, MIT Press, 2015.
[23] T. Zhao, R. Zhao, and M. Eskenazi, “Learning discourse-level diversity for neural
dialog models using conditional variational autoencoders,” in Proceedings of the
55th Annual Meeting of the Association for Computational Linguistics (Volume 1:
Long Papers), vol. 1, pp. 654–664, 2017.
[24] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proceedings of the
27th International Conference on Neural Information Processing Systems - Volume
2, NIPS14, (Cambridge, MA, USA), p. 2672V2680, MIT Press, 2014.
[25] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist
reinforcement learning,” Machine Learning, vol. 8, p. 229V256, May 1992.
[26] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” CoRR,
vol. abs/1411.1784, 2014.
[27] X. Zhou and W. Y. Wang, “MojiTalk: Generating emotional responses at scale,”
in Proceedings of the 56th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), (Melbourne, Australia), pp. 1128–1137, Association
for Computational Linguistics, July 2018.
[28] D. P. Kingma and M.Welling, “Auto-encoding variational bayes,” in Proceedings of
2nd International Conference on Learning Representations, ICLR 2014, Banff, AB,
Canada, April 14-16, 2014, Conference Track Proceedings (Y. Bengio and Y. Le-
Cun, eds.), 2014.
[29] S. R. Bowman, L. Vilnis, O. Vinyals, A. Dai, R. Jozefowicz, and S. Bengio, “Generating
sentences from a continuous space,” in Proceedings of The 20th SIGNLL Conference
on Computational Natural Language Learning, (Berlin, Germany), pp. 10–
21, Association for Computational Linguistics, Aug. 2016.
[30] L. Shang, Z. Lu, and H. Li, “Neural responding machine for short-text conversation,”
in Proceedings of the 53rd Annual Meeting of the Association for Computational
Linguistics and the 7th International Joint Conference on Natural Language
Processing (Volume 1: Long Papers), (Beijing, China), pp. 1577–1586, Association
for Computational Linguistics, July 2015.
[31] C.-W. Liu, R. Lowe, I. Serban, M. Noseworthy, L. Charlin, and J. Pineau, “How
NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation
metrics for dialogue response generation,” in Proceedings of the 2016 Conference
on Empirical Methods in Natural Language Processing, (Austin, Texas),
pp. 2122–2132, Association for Computational Linguistics, Nov. 2016.
[32] A. Graves, S. Fern´andez, and J. Schmidhuber, “Bidirectional lstm networks for improved
phoneme classification and recognition,” in Proceedings of the 15th International
Conference on Artificial Neural Networks: Formal Models and Their Applications
- Volume Part II, ICANN’05, (Berlin, Heidelberg), pp. 799–804, Springer-
Verlag, 2005.
[33] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text
classification,” in Proceedings of the 15th Conference of the European Chapter of
the Association for Computational Linguistics: Volume 2, Short Papers, (Valencia,
Spain), pp. 427–431, Association for Computational Linguistics, Apr. 2017.
[34] Y. Kim, “Convolutional neural networks for sentence classification,” in Proceedings
of the 2014 Conference on Empirical Methods in Natural Language Processing
(EMNLP), (Doha, Qatar), pp. 1746–1751, Association for Computational Linguistics,
Oct. 2014.
[35] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “ERNIE: Enhanced
language representation with informative entities,” in Proceedings of the 57th Annual
Meeting of the Association for Computational Linguistics, (Florence, Italy),
pp. 1441–1451, Association for Computational Linguistics, July 2019.

電子全文 電子全文(網際網路公開日期:20250824)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔