臺灣博碩士論文加值系統

English | Mobile

免費會員登入| 註冊

功能切換導覽列

訪客IP：216.73.216.59

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

溫庭豐

研究生(外文):

WEN, TING-FENG

論文名稱:

利用聲音和語意分析之人類情緒辨識的研究

論文名稱(外文):

A Study of Human Emotion Recognition Using Audio and Semantic Analysis

指導教授:

葉生正

、李明哲

指導教授(外文):

YEH, SHENG-CHENG、LEE, MING-CHE

口試委員:

謝朝和、邱奕世、魯大德

口試委員(外文):

HSIEH, CHAUR-HEH、CHIOU, YIH-SHYH、LU, TA-TE

口試日期:

2018-07-13

學位類別:

碩士

校院名稱:

銘傳大學

系所名稱:

資訊傳播工程學系碩士班

學門:

傳播學門

學類:

一般大眾傳播學類

論文種類:

學術論文

論文出版年:

2018

畢業學年度:

106

語文別:

中文

論文頁數:

中文關鍵詞:

聲音情緒辨識、語意分析、聲音數據增強、深度學習、Android

外文關鍵詞:

Audio Emotion Recognition、Semantic Analysis、Voice Data Augmentation、Deep Learning、Android

相關次數:

被引用:1
點閱:1197
評分:
下載:73
書目收藏:0

隨著科技的進步，聲音情緒辨識與語意分析之研究漸趨重要，目前技術主要應用於陪伴機器人、科技產品以及醫療用途上。本研究提出一套聲音情緒辨識與聊天對話系統。在情緒辨識分析上使用聲音數據增強的方法進行聲音的前處理，並透過STFT(Short-Time Fourier Transform)的方法將聲音轉成聲紋圖，最後使用CNN(Convolutional Neural Network)中GoogLeNet架構去辨識五種情緒，分別是平靜、快樂、悲傷、生氣和恐懼五大類，平均準確率為77.1%。在語意分析中將文本分為正面與負面兩種文本的訓練，並透過RNN(Recurrent Neural Network)中Seq2Seq的架構進行聊天對話。本研究在系統架構上分為用戶端與伺服器端兩部分，用戶端是採用Android系統進行開發，應用於手機應用程式中，伺服器端採用Ubuntu Linux 系統結合網頁伺服器進行建置。讓使用者透過手機APP錄製聲音，並將聲音檔案傳送到伺服器中，利用CNN進行情緒辨識，並透過RNN進行語意分析提供聊天對話功能，同時依照不同的情緒結果給予正面與負面的回覆，並將結果呈現在使用者之手機APP上，以達成本研究之目的。

With the progress of science and technology, the studies of audio emotion recognition and semantic analysis have become increasingly important. Currently, the technology is mainly applied to companion robots, technological products, and medical applications. This study proposes a system of audio emotion recognition and chats dialogue analysis. In the emotion recognition analysis, the voice was preprocessed by the method of Voice Data Augmentation. After that, the audios were converted into spectrogram by using the STFT(Short-Time Fourier Transform) method. Finally, human emotions could be divided into Calm, Happy, Sad, Angry and Fearful by using the GoogLeNet architecture in CNN(Convolutional Neural Network) in recognition of these five emotional statuses. The average accuracy rate was 77.1%. In the semantic analysis, the text was allocated into two versions of training, namely, positive and negative. Then it used the Seq2Seq architecture in RNN(Recurrent Neural Network) to conduct conversations. The system architecture was divided into two parts: the client and the server. The client was developed by the Android system and used in mobile phone applications. The server was built by Ubuntu Linux system combined with the web server. After recording a user's audio message by the mobile phone application, the audio file was transmitted to the server for emotional recognition and semantic analysis. Through using CNN to recognize emotions and RNN conducted dialogues, the system gave positive or negative responses according to different emotional status. The mobile phone application performs real-time emotional recognition and dialogue analysis which can then display and feedback to users with analytic results.

摘要 i
Abstract ii
誌謝 iv
目錄 v
圖目錄 viii
表目錄 x
第一章緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 論文架構 2
第二章相關技術與文獻探討 3
2.1 陪伴機器人 3
2.1.1 Paro機器人 3
2.1.2 Kuri機器人 5
2.1.3 Zenbo機器人 6
2.1.4 聊天機器人:萬小芳 7
2.2 深度學習 8
2.2.1 卷積神經網路 11
2.2.2 遞歸神經網路 16
2.3 語音辨識技術 19
2.4 樂音的原理 21
2.4.1 半音與全音 22
2.4.2 十二平均律 23
2.5 文字轉向量的技術 24
第三章系統架構與研究方法 28
3.1 系統架構 28
3.1.1 用戶端 29
3.1.2 伺服器端 29
3.1.3 卷積神經網路模型 29
3.1.4 遞歸神經網路模型 31
3.2 研究方法 32
3.2.1 情緒聲音的收集方法 32
3.2.2 聲音轉成聲紋圖方法 33
3.2.3 聲音切割方法 34
3.2.4 聲音數據增強方法 34
3.2.5 性別辨識方法 35
3.2.6 聊天對話方法 35
第四章研究成果與分析 38
4.1 系統功能 38
4.1.1 用戶端 39
4.1.2 伺服器端 42
4.2 原始方法分析 42
4.3 聲音切割方法分析 44
4.4 聲音數據增強方法分析 45
4.5 性別辨識方法分析 47
4.6 聊天對話方法分析 49
4.7 綜合研究結果比較與分析 51
第五章結論與未來展望 54
參考文獻 56

[1]人口高齡化，社會福利大挑戰, http://www.taiwanngo.tw/files/13-1000-9962-1.php?Lang=zh-tw
[2]Paro, http://www.parorobots.com/, 2014.
[3]Kuri, https://www.heykuri.com/, 2017.
[4]Zenbo, https://zenbo.asus.com/, Mar 2017.
[5]萬小芳, https://deepq.com/article/WFHLineBot, 2016.
[6]L. Deng and D. Yu, “Deep Learning: Methods and Applications,” Foundations and Trends in Signal Processing, vol. 7, no. 3-4, pp. 197-387, Jun 2013.
[7]UFLDL Tutorial, http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
[8]Y. Bengio, A. Courville and P. Vincent, “Representation Learning: A Review and New Perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, pp. 1798-1828, 2013.
[9]Y. Lecun; L. Bottou; Y. Bengio; P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.11, pp. 2278-2324, 1998.
[10]Szegedy, Christian, et al., “Going deeper with convolutions,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9, 2015.
[11]Mikolov, Tomáš, et al., “Recurrent neural network based language model,” Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, 2010.
[12]Lipton, Zachary C., John Berkowitz, and Charles Elkan, "A critical review of recurrent neural networks for sequence learning," arXiv preprint arXiv:1506.00019, 2015.
[13]Hochreiter, Sepp, and Jürgen Schmidhuber, “Long short-term memory,” Neural computation, pp. 1735-1780, 1997.
[14]The Unreasonable Effectiveness of Recurrent Neural Networks, http://karpathy.github.io/2015/05/21/rnn-effectiveness/, May 2015.
[15]Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le, “Sequence to sequence learning with neural networks,” Advances in neural information processing systems, pp. 3104-3112, 2014.
[16]Cho, Kyunghyun, et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
[17]語音命令與合成, http://www.penpower.com.tw/technology-voicecommand.asp, 2017.
[18]Zhang, Bin, Changqin Quan, and Fuji Ren, “Study on CNN in the recognition of emotion in audio and images,” Proceedings of the IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp. 1-5, 2016.
[19]Badshah, Abdul Malik, et al., “Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network,” Proceedings of the IEEE International Conference on Platform Technology and Service (PlatCon), pp. 1-5, 2017.
[20]聲音的三要素, http://www.phy.ntnu.edu.tw/demolab/html.php?html=modules/sound/section2, Jun 2018.
[21]全音, https://zh.wikipedia.org/wiki/全音, Oct 2016.
[22]吉他補給, https://www.guitar.com.tw/basic-music-theory/, 2011.
[23]十二平均律, https://zh.wikipedia.org/wiki/十二平均律, Nov 2017.
[24]阿寶的音樂交流&吉他教學, http://maxaindyrdx.pixnet.net/blog/post/32646208, Jun 2012.
[25]Turian, Joseph, Lev Ratinov, and Yoshua Bengio, “Word representations: a simple and general method for semi-supervised learning,” Proceedings of the 48th annual meeting of the association for computational linguistics, pp. 384-394, July 2010.
[26]Mikolov, Tomas, et al., “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[27]CLOUD SPEECH-TO-TEXT, https://cloud.google.com/speech-to-text/, 2018.
[28]Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.
[29]短時距傅立葉變換, https://zh.wikipedia.org/wiki/短時距傅立葉變換, Dec. 2017.
[30]結巴中文斷詞台灣繁體版本, https://github.com/ldkrsi/jieba-zh_TW, Jul. 2016.
[31]Pre-trained word vectors, https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md, May 2017.
[32]Deep Q&A, https://github.com/Conchylicultor/DeepQA, 2017.
[33]AndroidAudioRecorder, https://github.com/adrielcafe/AndroidAudioRecorder, Apr 2017.
[34]Flask, http://flask.pocoo.org/, 2018.

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	台灣財經新聞之雙向長短期記憶語意判別模型
2.	以深度學習方法結合財經領域情緒辭典進行財經新聞語句之情緒極性判斷
3.	利用深度混合模型辨識Android惡意應用程式
4.	基於語言模板之分散式表示法於社群媒體主題分類之研究
5.	整合注意力機制與圖像化操作碼之 Android 惡意程式分析研究
6.	基於系統呼叫序列與注意力LSTM模型偵測Android惡意軟體之研究
7.	自然語言處理之深度學習模型於股市消息面情緒判別分析之研究
8.	機器學習應用於偵測臉書垃圾貼文之研究
9.	結合深度學習與評論探勘注意機制之評分預測
10.	一個結合群眾智慧的投資組合推薦機制
11.	無監督式文字風格轉換-以白話武俠風格為例

無相關期刊

1.	結合臉部表情及聲音之嬰兒情緒辨識系統
2.	使用地球磁場進行室內定位系統之研究
3.	評分預測: 基於情緒與語意分析之多面向評論探勘
4.	一個基於語意分析的自然語言查詢系統
5.	利用深度學習進行人類情緒與年齡辨識之研究
6.	基於深度神經網路之華語語音情緒辨識系統研究
7.	利用語意分析於相關回饋以進行查詢擴展之方法
8.	餐飲推薦的綜合方法：語意分析與社群影響
9.	以第一人稱視覺影像的步態分析與應用
10.	應用於帕金森氏症患者之物理復健系統的研究
11.	以改良式N-Gram斷詞法結合潛在語意分析進行以改良式N-Gram斷詞法結合潛在語意分析進行網頁影像加註
12.	社群媒體語意分析
13.	應用深度神經網路與集成學習於語音情緒辨識
14.	台灣高中之第二外語實施狀況部分考察 -以桃園市高中為例-
15.	應用文字探勘與語意分析技術於資料視覺化實作：以兩岸MOOC平台業者社群媒體使用與傳統媒體報導為例

簡易查詢 | 進階查詢 |

本系統(Web6)共收集：論文的收藏量：書目與摘要：14194 筆、論文已授權全文：7982 筆

台北校區：臺北市中山北路五段 250 號桃園校區：桃園市龜山區大同里德明路 5 號
基河校區：臺北市基河路 130 號 4 樓金門校區：金門縣金寧鄉大學路 1 號
電話(02)2882-4564 分機2668

Taipei campus - 250 Zhong Shan N. Rd., Sec. 5, Taipei 111, Taiwan 　　　　　　｜Taoyuan campus - 5 De Ming Rd., Gui Shan District, Taoyuan City 333, Taiwan
Kinmen campus - 105 De Ming Rd., Jinsha Township, Kinmen County 890, Taiwan ｜Jihe campus - 3F-8F, No.130, Jihe Rd., Shihlin District, Taipei City 111, Taiwan

Tel:(02)2882-4564 #2668