跳到主要內容

臺灣博碩士論文加值系統

訪客IP:216.73.216.59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:溫庭豐
研究生(外文):WEN, TING-FENG
論文名稱:利用聲音和語意分析之人類情緒辨識的研究
論文名稱(外文):A Study of Human Emotion Recognition Using Audio and Semantic Analysis
指導教授:葉生正葉生正引用關係李明哲李明哲引用關係
指導教授(外文):YEH, SHENG-CHENGLEE, MING-CHE
口試委員:謝朝和邱奕世魯大德
口試委員(外文):HSIEH, CHAUR-HEHCHIOU, YIH-SHYHLU, TA-TE
口試日期:2018-07-13
學位類別:碩士
校院名稱:銘傳大學
系所名稱:資訊傳播工程學系碩士班
學門:傳播學門
學類:一般大眾傳播學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:71
中文關鍵詞:聲音情緒辨識語意分析聲音數據增強深度學習Android
外文關鍵詞:Audio Emotion RecognitionSemantic AnalysisVoice Data AugmentationDeep LearningAndroid
相關次數:
  • 被引用被引用:1
  • 點閱點閱:1197
  • 評分評分:
  • 下載下載:73
  • 收藏至我的研究室書目清單書目收藏:0
隨著科技的進步,聲音情緒辨識與語意分析之研究漸趨重要,目前技術主要應用於陪伴機器人、科技產品以及醫療用途上。本研究提出一套聲音情緒辨識與聊天對話系統。在情緒辨識分析上使用聲音數據增強的方法進行聲音的前處理,並透過STFT(Short-Time Fourier Transform)的方法將聲音轉成聲紋圖,最後使用CNN(Convolutional Neural Network)中GoogLeNet架構去辨識五種情緒,分別是平靜、快樂、悲傷、生氣和恐懼五大類,平均準確率為77.1%。在語意分析中將文本分為正面與負面兩種文本的訓練,並透過RNN(Recurrent Neural Network)中Seq2Seq的架構進行聊天對話。本研究在系統架構上分為用戶端與伺服器端兩部分,用戶端是採用Android系統進行開發,應用於手機應用程式中,伺服器端採用Ubuntu Linux 系統結合網頁伺服器進行建置。讓使用者透過手機APP錄製聲音,並將聲音檔案傳送到伺服器中,利用CNN進行情緒辨識,並透過RNN進行語意分析提供聊天對話功能,同時依照不同的情緒結果給予正面與負面的回覆,並將結果呈現在使用者之手機APP上,以達成本研究之目的。
With the progress of science and technology, the studies of audio emotion recognition and semantic analysis have become increasingly important. Currently, the technology is mainly applied to companion robots, technological products, and medical applications. This study proposes a system of audio emotion recognition and chats dialogue analysis. In the emotion recognition analysis, the voice was preprocessed by the method of Voice Data Augmentation. After that, the audios were converted into spectrogram by using the STFT(Short-Time Fourier Transform) method. Finally, human emotions could be divided into Calm, Happy, Sad, Angry and Fearful by using the GoogLeNet architecture in CNN(Convolutional Neural Network) in recognition of these five emotional statuses. The average accuracy rate was 77.1%. In the semantic analysis, the text was allocated into two versions of training, namely, positive and negative. Then it used the Seq2Seq architecture in RNN(Recurrent Neural Network) to conduct conversations. The system architecture was divided into two parts: the client and the server. The client was developed by the Android system and used in mobile phone applications. The server was built by Ubuntu Linux system combined with the web server. After recording a user's audio message by the mobile phone application, the audio file was transmitted to the server for emotional recognition and semantic analysis. Through using CNN to recognize emotions and RNN conducted dialogues, the system gave positive or negative responses according to different emotional status. The mobile phone application performs real-time emotional recognition and dialogue analysis which can then display and feedback to users with analytic results.
摘要 i
Abstract ii
誌謝 iv
目錄 v
圖目錄 viii
表目錄 x
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 論文架構 2
第二章 相關技術與文獻探討 3
2.1 陪伴機器人 3
2.1.1 Paro機器人 3
2.1.2 Kuri機器人 5
2.1.3 Zenbo機器人 6
2.1.4 聊天機器人:萬小芳 7
2.2 深度學習 8
2.2.1 卷積神經網路 11
2.2.2 遞歸神經網路 16
2.3 語音辨識技術 19
2.4 樂音的原理 21
2.4.1 半音與全音 22
2.4.2 十二平均律 23
2.5 文字轉向量的技術 24
第三章 系統架構與研究方法 28
3.1 系統架構 28
3.1.1 用戶端 29
3.1.2 伺服器端 29
3.1.3 卷積神經網路模型 29
3.1.4 遞歸神經網路模型 31
3.2 研究方法 32
3.2.1 情緒聲音的收集方法 32
3.2.2 聲音轉成聲紋圖方法 33
3.2.3 聲音切割方法 34
3.2.4 聲音數據增強方法 34
3.2.5 性別辨識方法 35
3.2.6 聊天對話方法 35
第四章 研究成果與分析 38
4.1 系統功能 38
4.1.1 用戶端 39
4.1.2 伺服器端 42
4.2 原始方法分析 42
4.3 聲音切割方法分析 44
4.4 聲音數據增強方法分析 45
4.5 性別辨識方法分析 47
4.6 聊天對話方法分析 49
4.7 綜合研究結果比較與分析 51
第五章 結論與未來展望 54
參考文獻 56
[1]人口高齡化,社會福利大挑戰, http://www.taiwanngo.tw/files/13-1000-9962-1.php?Lang=zh-tw
[2]Paro, http://www.parorobots.com/, 2014.
[3]Kuri, https://www.heykuri.com/, 2017.
[4]Zenbo, https://zenbo.asus.com/, Mar 2017.
[5]萬小芳, https://deepq.com/article/WFHLineBot, 2016.
[6]L. Deng and D. Yu, “Deep Learning: Methods and Applications,” Foundations and Trends in Signal Processing, vol. 7, no. 3-4, pp. 197-387, Jun 2013.
[7]UFLDL Tutorial, http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
[8]Y. Bengio, A. Courville and P. Vincent, “Representation Learning: A Review and New Perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, pp. 1798-1828, 2013.
[9]Y. Lecun; L. Bottou; Y. Bengio; P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.11, pp. 2278-2324, 1998.
[10]Szegedy, Christian, et al., “Going deeper with convolutions,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9, 2015.
[11]Mikolov, Tomáš, et al., “Recurrent neural network based language model,” Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, 2010.
[12]Lipton, Zachary C., John Berkowitz, and Charles Elkan, "A critical review of recurrent neural networks for sequence learning," arXiv preprint arXiv:1506.00019, 2015.
[13]Hochreiter, Sepp, and Jürgen Schmidhuber, “Long short-term memory,” Neural computation, pp. 1735-1780, 1997.
[14]The Unreasonable Effectiveness of Recurrent Neural Networks, http://karpathy.github.io/2015/05/21/rnn-effectiveness/, May 2015.
[15]Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le, “Sequence to sequence learning with neural networks,” Advances in neural information processing systems, pp. 3104-3112, 2014.
[16]Cho, Kyunghyun, et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
[17]語音命令與合成, http://www.penpower.com.tw/technology-voicecommand.asp, 2017.
[18]Zhang, Bin, Changqin Quan, and Fuji Ren, “Study on CNN in the recognition of emotion in audio and images,” Proceedings of the IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp. 1-5, 2016.
[19]Badshah, Abdul Malik, et al., “Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network,” Proceedings of the IEEE International Conference on Platform Technology and Service (PlatCon), pp. 1-5, 2017.
[20]聲音的三要素, http://www.phy.ntnu.edu.tw/demolab/html.php?html=modules/sound/section2, Jun 2018.
[21]全音, https://zh.wikipedia.org/wiki/全音, Oct 2016.
[22]吉他補給, https://www.guitar.com.tw/basic-music-theory/, 2011.
[23]十二平均律, https://zh.wikipedia.org/wiki/十二平均律, Nov 2017.
[24]阿寶的音樂交流&吉他教學, http://maxaindyrdx.pixnet.net/blog/post/32646208, Jun 2012.
[25]Turian, Joseph, Lev Ratinov, and Yoshua Bengio, “Word representations: a simple and general method for semi-supervised learning,” Proceedings of the 48th annual meeting of the association for computational linguistics, pp. 384-394, July 2010.
[26]Mikolov, Tomas, et al., “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[27]CLOUD SPEECH-TO-TEXT, https://cloud.google.com/speech-to-text/, 2018.
[28]Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.
[29]短時距傅立葉變換, https://zh.wikipedia.org/wiki/短時距傅立葉變換, Dec. 2017.
[30]結巴中文斷詞台灣繁體版本, https://github.com/ldkrsi/jieba-zh_TW, Jul. 2016.
[31]Pre-trained word vectors, https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md, May 2017.
[32]Deep Q&A, https://github.com/Conchylicultor/DeepQA, 2017.
[33]AndroidAudioRecorder, https://github.com/adrielcafe/AndroidAudioRecorder, Apr 2017.
[34]Flask, http://flask.pocoo.org/, 2018.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊