(44.192.10.166) 您好!臺灣時間:2021/03/06 19:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:王豪逸
研究生(外文):Hao-Yi Wang
論文名稱:影像情境對於對話系統的影響
論文名稱(外文):The Impacts of Image Contexts on Dialogue Systems
指導教授:李偉柏
指導教授(外文):Wei-Po Lee
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊管理學系研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:61
中文關鍵詞:影像辨識自然語言對話機器學習類神經網路循環神經網路捲積神經網路
外文關鍵詞:Dialogue Convolutional neural networks Recurrent neural networks Image recognition Natural language Neural networks Machine learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:81
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
與電腦對話在日常生活中越來越常見,不僅僅適用於執行指令、查詢資料,更想被用於陪伴、娛樂,如同一個真實的朋友一般隨時能與人對話。在過往的對話系統多中多數只會根據指令做出既有的回覆,無法讓人感覺到他的生命力還是當作電腦系統來看待。為了能做出更真實的對話系統,本研究將使用類神經網路在較接近生活方式的對話中訓練。在生活中對話並不只是簡單的問答問題,回答上不只是簡單名詞或是對錯,需要有多元有趣的回應。對話中也不只是根據文字做出回應還會考慮眼前的人事物,做出符合情境的回應。本研究會將以上兩種元素將入對話系統中,資料會採用影集的字幕。影集除了能有多元的對話回應,還能從中提取出影像給予對話系統更多參考資料。

本研究將使用深度模型網路,結合近年來在影像辨識領域有良好效果的捲積神經網路與在自然語言中有優良表現的循環神經網路,完成同時考慮影像與文字的對話系統。研究過程探討了在對話過程中影像給予的幫助、影集字幕開放式的對話是否適合用於訓練、是否有方法能夠量化產出回應的合理性、另外也使用人工評量的方式驗證模型是否能夠產出讓人覺得合理的對話。
Chatting with machines is not only possible but also more and more common in our lives these days. With the approach, we can execute commands and obtain companionship and entertainment through interacting with the machines. In the past, most dialogue systems only used existing replies based on the instructions the machines received. However, it is unlikely for people to feel the vitality of the machine. People still regard their chatting partners are computer systems. In order to develop a more realistic dialogue system, this study adopts a deep neural network to train the machines to make more lifestyle-oriented dialogues. People’s common dialogues include not only simple question-answering problems and the answers are not just as short as a noun or a yes-or-no answer. There are also many diverse and interesting responses. Moreover, suitable communicative responses among the conversationalists depend not only on the contents, but also on the environmental contexts. This study develops a dialogue system to take into account the two essential factors, and uses the TV series as the training datasets. These datasets contain not only conversational contents but also video frames which represent the contexts and the situations when the conversations occur.

To explore the effect of the image context on the utterance in a dialogue, this work uses a deep neural network model, combining a convolutional neural network (which works well in image recognition) with a recurrent neural network (which achieves excellent performance in natural language). It aims to develop a dialogue system to consider both images and utterances, and to find better ways to evaluate the dialogue system’s responses including defining a quantitative measurement and designing a questionnaire to verify the models learnt from the datasets.
摘 要 ii
Abstract iii
1. 緒論 1
1-1. 研究背景 1
1-2. 研究動機與目的 2
2. 文獻探討 3
2-1. 對話 3
2-1-1. 循環神經網路 3
2-1-2. SeqToSeq 模型 4
2-1-3. 對話系統 5
2-2. 影像處理 5
2-2-1. 圖片處理 5
2-2-2. 影片處理 7
2-3. 影像與自然語言 7
3. 研究方法 9
3-1. 資料集 10
3-1-1. 資料集介紹 10
3-1-2. 資料集的選擇 11
3-1-3. 字幕處理 12
3-2. 衡量方法 14
3-2-1. 自動評量 15
3-2-2. 人工評量 16
3-3. 模型 16
3-3-1. 影像與語句模型 16
3-3-2. 純語句模型 20
3-3-3. 相似度模型 20
4. 實驗及結果 22
4-1. 資料集分析 22
4-2. 訓練模型 24
4-2-1. 影像擷取 24
4-2-2. 訓練策略 25
4-2-3. 收斂狀況 26
4-3. 影像的影響 28
4-3-1. 對於模型產出的影響 28
4-3-2. 影像分析 33
4-4. 訓練結果 35
4-4-1. 訓練資料的學習狀況 35
4-4-2. 相似度模型 38
4-4-3. 人工評量結果 40
4-5. 綜合討論 41
5. 結論與未來研究 44
5-1. 結論 44
5-2. 未來研究 44
參考資料 46
附錄-人工評量題目 49
[1].Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2), 157-166.
[2].Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
[3].Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
[4].Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
[5].Vinyals, O., & Le, Q. (2015). A neural conversational model. arXiv preprint arXiv:1506.05869.
[6].Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
[7].Krizhevsky, A. (2014). One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997.
[8].Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[9].He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[10].Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 1725-1732).
[11].Ma, S., Sigal, L., & Sclaroff, S. (2016). Learning activity progression in lstms for activity detection and early detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1942-1950).
[12].Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., & Vijayanarasimhan, S. (2016). YouTube-8M: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675.
[13].Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., & Darrell, T. (2015). Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2625-2634).
[14].Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., ... & Bernstein, M. S. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1), 32-73.
[15].Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156-3164).
[16].Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Zitnick, C. L., & Girshick, R. (2017, July). CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1988-1997). IEEE.
[17].Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Lawrence Zitnick, C., & Parikh, D. (2015). Vqa: Visual question answering. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2425-2433).
[18].Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2016). Stacked attention networks for image question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 21-29).
[19].Malinowski, M., Rohrbach, M., & Fritz, M. (2015). Ask your neurons: A neural-based approach to answering questions about images. In Proceedings of the IEEE international conference on computer vision (pp. 1-9).
[20].Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 311-318). Association for Computational Linguistics.
[21].Feng, M., Xiang, B., Glass, M. R., Wang, L., & Zhou, B. (2015, December). Applying deep learning to answer selection: A study and an open task. In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on (pp. 813-820). IEEE.
電子全文 電子全文(網際網路公開日期:20230717)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔