跳到主要內容

臺灣博碩士論文加值系統

(98.82.120.188) 您好!臺灣時間:2024/09/15 15:02
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:蘇于哲
研究生(外文):Yu-Che Su
論文名稱:基於中文語音之情緒辨識
論文名稱(外文):Emotion Recognition based on Chinese Speech Signals
指導教授:駱 樂
指導教授(外文):Leh Luoh
學位類別:碩士
校院名稱:中華大學
系所名稱:電機工程學系(所)
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:英文
論文頁數:59
中文關鍵詞:語音辨識情緒
外文關鍵詞:speech recognitionemotion
相關次數:
  • 被引用被引用:0
  • 點閱點閱:218
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在目前人際關係中,正確表達與分辨情感是人際溝通中重要的一部分。本論文中,對於人類的五種基本情緒,建立一種中文語音情緒的辨識方法。過去語音訊號為基礎的情緒分類中,常用的語音特徵大多為音高、音高能量和音長的統計值。然而,這些語韻特徵,易受到雜訊影響。使用這些特徵值的辨識方法,其辨識率會降低許多。對於語音情緒辨識,我們選擇了13個MFCC係數做為特徵向量。接下來,我們為五種情緒建立情緒的特徵參數機率分佈模型,並以高斯混合模型(Gaussian Mixture Model)去描述此分佈,再來我們對欲辨識的語音抽取特徵參數,跟五種情緖模型比對,其中相似度最大的的情緒即為辨識結果。本論文中所使用的資料庫,由兩位女性語者在不同情緒下,表達出20句不同字數之句型共有500句。實驗結果,兩個語者高興情緒之辨識率分別為74﹪和68﹪,而兩個語者全部情緒之平均辨識率分別為55﹪和48﹪。
In the relationship, how to express emotion correctly is an important aspect in communication. In this thesis, a speech based emotion classification method is presented. Five basic for human emotions including anger, boredom, happiness, neutral and sadness are investigated. In emotion classification of speech signals, the tradition features are statistics of pitch, pitch energy and duration. However, the performance of systems employing these features degrades substantially because these prosodic features are easily influenced by noise. For speech emotion recognition, we select 13 MFCC coefficients as the basic features to form the feature vector. Further, we establish the emotional parameters probability distribution model by means of our five GMM (Gaussian Mixture Model) which describe the distribution. Afterwards, we use ML algorithm for identification. Finally, we take identification method to our input statement and will get output results which has the most similarity (likelihood) compared to five emotional model. The corpus consists of 500 emotional utterances from two female speakers. The results show that the emotion patterns can be recognized fairly. For the Happiness emotion classification, we achieved the best accuracy of 74% and 68% with two female speakers. More specifically, an average accuracy of 55% and 48% are obtained for two female speakers respectively.
摘要...........................................................................i

ABSTRACT......................................................................ii

ACKNOWLEDGMENTS..............................................................iii

TABLEOFCONTENTS...............................................................iv

LIST OF FIGURES.............................................................vi

LIST OF TABLES.............................................................vii

Chapter 1 INTRODUCTION.........................................................1
1.1Introduction................................................................1
1.2 Definition of Emotion......................................................1
1.3 Theories of Emotion........................................................2
1.4 Models and Categories of Emotions..........................................3
1.5 Motivation and Objectives..................................................7
1.6 Thesis Organization........................................................9

Chapter 2 BACKGROUND....................................................................11
2.1 Emotional States and Mandarin Speech......................................11
2.2 Related Works.............................................................12

Chapter 3 EMOTION RECOGNITION METHODS.........................................15
3.1 Basic Concepts of Speech Production.......................................15
3.2 Short-Time Speech Processing..............................................16
3.2.1 Digital Modes for Speech Signals.....................................17
3.3 Preprocessing.............................................................18
3.3.1 Pre-emphasis.........................................................18
3.3.2 Frame blocking.......................................................19
3.3.3 Windowing............................................................20
3.4 Feature Extraction........................................................21
3.4.1 Mel frequency cepstral coefficients..................................21
3.5 Emotion Classification....................................................25
3.5.1 Gaussian Mixture Model...............................................25
3.5.2 Model Training And Parameter Estimate................................30
3.5.3 K-means algorithm....................................................32
3.5.4 Emotion Recognition System Implementation............................34

Chapter 4 EXPERIMENTAL RESULTS................................................36
4.1 Create a Speech Corpus....................................................36
4.1.1 The Testing Corpus...................................................36
4.1.2 The Training Corpus..................................................37
4.2 Emotion Recognition Experiment Results....................................39
4.2.1 Experimental Result of Speaker-Independent and Speaker-Dependent.....40
4.2.2 Experimental Results of Emotion Cluster Recognition..................41

Chapter 5 CONCLUSION AND FUTURE WORKS.........................................44
REFERENCE.....................................................................47
APPENDIX......................................................................50

LIST OF FIGURES
1.1 Graphic representation of the arousal-valence theory of emotions...........5
1.2 Plutchik's“emotion wheel”................................................7
1.3 Development of emotional distinctions by Fox...............................8
3.1 Emotion recognition method architecture...................................15
3.2 Schematic view of the human vocal mechanism [20]..........................16
3.3 General discrete-time model for speech production.........................17
3.4 Blocking of speech into overlapping frames................................20
3.5 Block diagram of MFCCS extraction.........................................22
3.6 Nonlinear transformation from linear frequency to Mel.....................23
3.7 Mel scale filter bank.....................................................24
3.8 Gaussian Mixture Models for emotion speech................................26
3.9 Mean of GMM using 8 mixtures..............................................28
3.10 3D histogram of MFCC1-MFCC2..............................................29
3.11 Compare with the GMM distribution using 8, 16, 32, and 64 mixtures.......29
3.12 Training the emotion model procedure.....................................33
3.13 Emotion recognition procedure............................................35
4.1 Speaker A Happiness Recognition rate......................................42
4.2 Speaker B Happiness Recognition rate......................................43

LIST OF TABLES
1.1 Summaries the list of emotions that was defined by different scientists[10].....6
4.1 Utterances of corpus I....................................................37
4.2 Mandarin vowels Classification............................................38
4.3(a) Speaker A for recognition results......................................40
4.3(b) Speaker B for recognition results......................................40
[1] Lee, C.M. and Narayanan, S. ”Emotion Recognition Using Data-Driven Fuzzy Inference System,” Eurospeech 2003,Geneva
[2] R. Cowie, E. Douglas, N. Tsapatsoulis, G. Vostis, S. Kollias, W. Fellenz and J. G. Taylor.2001. Emotion Recognition in Human-computer Interaction. In: IEEE Signal Processing Magazine, Band 18 p.32 – 80
[3] S. E. Bou-Ghazale and J. H. L. Hansen, “A comparative study of traditional and newly proposed features for recognition of speech under stress,” IEEE Trans. Speech Audio Process., vol. 8, no. 4, pp. 429–442,Jul. 2000.
[4] 馮觀富,「情緒心理學」,心理出版社,2005。
[5] T. Yamada, H. Hashimoto, and N. Tosa, “Pattern Recognition of Emotion with Neural Network, ”Proceedings of the 1995 IEEE IECON 21st International Conference on Industrial Electronics, Control, and Instrumentation, vol.1 , pp.183 -187,1995.
[6] Strongman K.T.著,游恆山譯,「情緒心理學」,五南圖書發行,民91,2002 台北巿。
[7] Hartwig Holzapfel.,Emotionen als Parameter der Dialogverarbeitung,University
of Karlsruhe, 2003.
[8] R. Tato, R. Santos, R. Kompe, J.M. Pardo, “Emotional Space Improves Emotion
Recognition,” ICSLP, pp. 2029-2032, 2002.
[9] Rabiner L., and Schafer R.,Digital processing of speech signals, Prentice-Hall, Inc., N.J, 1978.
[10] Ortony, A., and Turner, T. J. 1990. What's basic about basic emotions? Psychological Review, pp.315-331.
[11] K. R. Scherer, “Vocal communication of emotion: A review of research paradigms,” Speech Commun., vol. 40, pp. 227–256, 2003.
[12] 王小川,「語音訊號處理」,全華科技圖書,2004。
[13] N. Esau, L. Kleinjohann, and B. Kleinjohann, “Fuzzy Emotion Recognition in Natural Speech Dialogue,” In IEEE Inter. Workshop on Robots and Human Interactive Communication, pp.317 – 322, Aug. 2005
[14] C. Breazeal, “Emotive qualities in robot speech,” IEEE/RSJ International
Conference on Intelligent Robots and Systems, Vol. 3, pp.1388- 394, 2001
[15] Lee, C.M. and Narayanan, S. ”Emotion Recognition Using Data-Driven Fuzzy Inference System,” Eurospeech 2003,Geneva
[16] A.A. Razak, R. Komiya, M. I. Zaimal Abidin,” Comparison between fuzzy and NN method for speech emotion recognition, ”Proceedings of IEEE the Third International Conference on Information Technology and Applications, vol.1, pp: 297 – 302, July 2005.
[17] O.W. Kwon, K. Chan, J. Hao, T.W. Lee ,“Emotion Recognition by Speech Signals,” Eurospeech, pp.125-128,2003.
[18] T.L. Nwe, S.W. Foo and L.C. De Silva,“Speech Emotion Recognition Using Hidden Markov Models,” Speech Communication,pp.603-623,2003
[19] S. Ramamohan and S. Dandapat, ” Sinusoidal Model-Based Analysis and Classification of Stressed Speech,” IEEE Trans. Audio, Speech, And Language Processing, vol. 14, no. 3, May 2006.
[20] X. Huang, A. Acero, H. Hon,Spoken Language Processing, Prentice Hall, 2001.
[21] Rabiner L., and Schafer R., Digital processing of speech signals, Prentice-Hall, Inc., N.J, 1978.
[22] J.F. Kaiser, Discrete-Time Speech Signal Processing, pp.11-99, Prentice-Hall PTR, 2002
[23] Rabiner L., and Juang B.H., Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, N.J,1993.
[24] S. Furui, Digital speech processing, synthesis, and recognition – Second edition, revised and expanded, Marcel Dekker, Inc., New York, 2000.
[25] R. Plutchik, Emotion: A Psycho evolutionary Synthesis. New York: Harper & Row, 1980.
[26] Douglas A. Reynolds and Richard C. Rose. "Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models." IEEE Trans. Speech, Audio Proc. , vol. 3, no. 1, January 1995.
[27] Bishop, Christopher M, Neural networks for pattern recognition, 1995.
[28] J.H. Yeh, Emotion Recognition from Mandarin Speech Signals, Master Thesis, Tatung University, 2004
[29] 張柏雄,“中文語音情緒之自動辨識,”master thesis of Engineering Science department of the National Cheng Kung University, 2002.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top