(44.192.10.166) 您好!臺灣時間:2021/03/06 04:19
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:方楷文
研究生(外文):FANG, KAI-WEN
論文名稱:使用詞向量加權計算之聊天機器人的建置與效益
論文名稱(外文):Implementation and Effectiveness for Chatbot Using Weighted Average Word Embedding
指導教授:吳帆吳帆引用關係
指導教授(外文):WU,FAN
口試委員:吳帆許巍嚴洪銘建
口試委員(外文):WU, FANHSU, WEI-YENHUNG, MING-CHIEN
口試日期:2020-07-29
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊管理系研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:55
中文關鍵詞:聊天機器人Word2Vec詞向量TF-IDF支援向量機
外文關鍵詞:ChatbotWord2VecWord vectorTF-IDFSVM
相關次數:
  • 被引用被引用:0
  • 點閱點閱:97
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
隨著深度學習和自然語言處理的技術不斷進步,加上電子商務及即時通訊軟體的普及,促使問答聊天機器人的需求不斷提高。然而,使用者輸入的訊息千奇百怪,而且一個問題的表達方式可能有好幾種,甚至許多時候還會超出預設問答集本身的範圍,使得聊天機器人運用於常見問答更具複雜性及挑戰性。
本論文設計並提出一個基於檢索的一般性聊天機器人系統,利用大量電影及影集字幕的簡短對話來做為開放式問答集的來源,並以開放式問答集來處理超出預設問答集的情況。我們在聊天機器人的系統架構中使用Word2Vec預訓練的詞向量模型及TF-IDF(Term Frequency-Inverse Document Frequency)來得到每一個加權字詞向量,接著算出每一個問題文本的平均加權向量,再透過支援向量機(Support Vector Machine, SVM)分類模型、餘弦相似度方法、開放式問答集、推播通知和錯誤處理,建構出一個問答聊天機器人系統,具有語義層面上的檢索能力,且會隨著使用時間越久,回覆的準確度也將隨之提高。最後本論文以兩個不同領域的商家來實做客服問答聊天機器人,根據情境任務及問卷的分析結果顯示約八成的受測者認為可以快速且正確的取得一般問題的答案。期望可以透過本論文提出的聊天機器人系統,能做為第三方開發問答聊天機器人所參考的依據,並且能夠得到不錯的表現結果。

With the advancement of deep learning and Natural Language Processing and the popularity of e-commerce and instant message software, the demand for customer service Q&A chatbots has steadily soared in recent years. However, users may send the different type of messages to express same things. In addition, there could be hundreds of ways to express a one question and even sometimes user’s questions are beyond the scope of the default QA dataset. Therefore, constructing the Q&A chatbot system for customer service is more complex and challenging.
In this paper, we build a retrieval-based general chatbot system and exploit the great amount of short-text conversation available on movie subtitles as the source of the open-domain Q&A dataset. The method involves using the Word2Vec as a pre-trained word vector model tool, the algorithm for getting weighted average vector of questions, the SVM classifier model to classify user’s question, the cosine similarity to compute the similarity between the user’s message and each question in the dataset, push notifications and error handling to develop a chatbot system. To evaluate and test our proposed system, we have used questions set we created as testing data and user survey to analyze and evaluate. The results revealed that almost 80% of users think that this chatbot can quickly and correctly obtain answers in each task. We expected that the general chatbot system we have proposed can be applied to automatic response to common Q&A and will have a good performance.

圖目錄 iii
表目錄 v
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 3
1.3 研究目的 5
1.4 章節概要 6
第二章 文獻探討 8
2.1聊天機器人 8
2.2 自然語言處理 11
2.2.1斷詞 12
2.2.2詞性標記 13
2.2.3停用字自移除 13
2.2.4向量空間表示 13
2.3字詞向量 14
2.3.1 Word2Vec 15
2.3.2 CBOW與Skip-gram 16
第三章 一般性聊天機器人系統架構與設計 19
3.1 系統架構與設計 19
3.2 詞向量訓練階段 21
3.2.1 訓練文本資料集 21
3.2.2 資料前處理 22
3.2.3 Word2Vec字詞模型訓練 23
3.3實際運作流程 25
3.3.1 問答資料集 25
3.3.2 訊息文字前處理 28
3.3.3 文本向量化 30
3.3.4 問題文本分類 32
3.3.5 訊息相似度比對 33
3.3.6 錯誤及推播處理 34
第四章 系統實作與評估 36
4.1 開發環境與平台 36
4.2 相關參數設定 37
4.3 系統呈現 37
4.3.1 實做對象與領域 37
4.3.2 系統實作與呈現 39
4.4 效能評估 43
4.4.1 回覆適當性測試 43
4.4.2 使用者問卷設計與統計 44
第五章 結論與未來展望 49
5.1 結論 49
5.2 未來展望 49
參考文獻 52
附錄問卷 55
[1]Bickmore, T. W., Schulman, D., & Sidner, C.(2013). Automated interventions for multiple health behaviors using conversational agents. Patient education and counseling, 92(2), 142-148.
[2]Cameron, A. F., & Webster, J. J. C. i. H. b. (2005). Unintended consequences of emerging communication technologies: Instant messaging in the workplace. 21(1), 85-103.
[3]Caselles-Dupré, H., Lesaint, F., & Royo-Letelier, J.(2018, September). Word2vec applied to recommendation: Hyperparameters matter. In Proceedings of the 12th ACM Conference on Recommender Systems, 352-356.
[4]Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P.(2011). Natural language processing(almost)from scratch. Journal of machine learning research, 12(ARTICLE), 2493− 2537.
[5]Graf, B., Krüger, M., Müller, F., Ruhland, A., & Zech, A.(2015, November). Nombot: simplify food tracking. In Proceedings of the 14th International Conference on Mobile and Ubiquitous Multimedia, 360-363.
[6]Djaballah, K. A., Boukhalfa, K., & Boussaid, O. (2019, December). Sentiment Analysis of Twitter Messages using Word2vec by Weighted Average. In 2019 Sixth International Conference on Social Networks Analysis, Management and Security(SNAMS), 223-228.
[7]Higashinaka, R., Imamura, K., Meguro, T., Miyazaki, C., Kobayashi, N., Sugiyama, H., Hirano, T., Makino, T., Matsuo, Y.(2014, August). Towards an open-domain conversational system fully based on natural language processing. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, 928-939.
[8]Kerly, A., Ellis, R., & Bull, S.(2008, April). CALMsystem: a conversational agent for learner modelling. In International Conference on Innovative Techniques and Applications of Artificial Intelligence, Applications and Innovations in Intelligent Systems, 89-102.
[9]Kowsher, M., Alam, M. A., Uddin, M. J., Islam, M. R., Pias, N., & Saifullah, A. R. M.(2019, July). Bengali Informative Chatbot. In 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering(IC4ME2), 1-7
[10]Lester, J., Branting, K., & Mott, B.(2004). Conversational agents. The Practical Handbook of Internet Computing, 220-240.
[11]Lo, R. T.-W., He, B., & Ounis, I.(2005). Automatically building a stopword list for an information retrieval system. In Journal on Digital Information Management: Special Issue on the 5th Dutch-Belgian Information Retrieval Workshop(DIR).
[12]Manning, C., & Schutze, H.(1999). Foundations of statistical natural language processing, Cambridge, MA.: MIT press.
[13]Mikolov, T., Chen, K., Corrado, G., & Dean, J.(2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[14]Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J.(2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 3111–3119.
[15]Razzaghnoori, M., Sajedi, H., & Jazani, I. K.(2018). Question classification in Persian using word vectors and frequencies. Cognitive Systems Research, 47, 16-27.
[16]Rumelhart, D. E., Hinton, G. E., & McClelland, J. L.(1986). A general framework for parallel distributed processing. Parallel distributed processing: Explorations in the microstructure of cognition, 1(45-76), 26.
[17]Sharma, V., Goyal, M., & Malik, D.(2017). An intelligent behaviour shown by chatbot system. International Journal of New Technology and Research, 3(4), 2454-4116.
[18]Shevat, A.(2017). Designing Bots: Creating Conversational Experiences. California, CA: O'Reilly Media, Inc.
[19]Singh, S. P., Kearns, M. J., Litman, D. J., & Walker, M. A. (2000). Reinforcement learning for spoken dialogue systems. In Advances in Neural Information Processing Systems., 956-962.
[20]Thada, V., & Jaglan, V.(2013). Comparison of jaccard, dice, cosine similarity coefficient to find best fitness value for web retrieved documents using genetic algorithm. International Journal of Innovations in Engineering and Technology, 2(4), 202-205.
[21]Wang, H., Lu, Z., Li, H., & Chen, E.(2013, October). A dataset for research on short-text conversations. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 935-945.
[22]Warschauer, M., & Healey, D.(1998). Computers and language learning: An overview. Language teaching, 31(2), 57-71.
[23]Weizenbaum, J.(1966). ELIZA---a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36-45.
[24]Wu, Y., Li, Z., Wu, W., & Zhou, M.(2018). Response selection with topic clues for retrieval-based chatbots. Neurocomputing, 316, 251-261.
[25]Young, S., Gašić, M., Thomson, B., & Williams, J. D.(2013). Pomdp-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5), 1160-1179.
[26]Zadrozny, W., Budzikowska, M., Chai, J., Kambhatla, N., Levesque, S., & Nicolov, N.(2000). Natural language dialogue for personalized interaction. Communications of the ACM, 43(8), 116-120.
[27]Zhang, D., Xu, H., Su, Z., & Xu, Y.(2015). Chinese comments sentiment classification based on word2vec and SVMperf. Expert Systems with Applications, 42(4), 1857-1863.


電子全文 電子全文(網際網路公開日期:20210730)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔