跳到主要內容

臺灣博碩士論文加值系統

(216.73.217.6) 您好!臺灣時間:2026/06/10 04:10
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:葉藍霙
研究生(外文):Yeh, Lan-Ying
論文名稱:使用時頻變化調變於強健語音情緒辨識
論文名稱(外文):Spectro-Temporal Modulations for Robust Speech Emotion Recognition
指導教授:冀泰石
指導教授(外文):Chi, Tai-Shih
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電信工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:英文
論文頁數:48
中文關鍵詞:情緒語音辨識分類特徵參數
外文關鍵詞:emotionspeechrecognitionclassificationfeature
相關次數:
  • 被引用被引用:0
  • 點閱點閱:414
  • 評分評分:
  • 下載下載:34
  • 收藏至我的研究室書目清單書目收藏:0
語音情緒的分類是近年來新興的研究題目,目前大多數的研究都著重在乾淨語音中進行分類。在本論文中,我們利用聽覺感知模型提出一種新的時頻變化參數 (joint Rate-Scale features, RS features),藉由此參數來處理有雜訊情況下的語音情緒辨識的問題。我們將柏林情緒語料庫(Berlin Emotional Database)以及愛寶情緒語料庫(FAU AIBO Database)加入不同訊雜比的白雜訊(white noise)及人聲雜訊(babble noise),並且以乾淨語料訓練、有雜訊語料測試的方式評估效能,以模擬真實應用中未能事先預知雜訊程度的狀況。我們也進一步使用循序前進浮動搜尋(Sequential Forward Floating Selection, SFFS)來探討所提出特徵參數的冗餘性,以進一步降低所需參數的維度。實驗於柏林情緒語料庫結果顯示,與傳統音韻參數結合梅爾倒頻率係數參數相比,尤其在低訊雜比的情況下,使用時頻變化參數將有更高的辨識率。實驗結果顯示對於愛寶情緒語料庫,在訊雜比很高的情況下,傳統參數和時頻變化參數皆有過度訓練的情況,需要進一步降低維度及改進參數。
Speech emotion recognition is mostly considered in clean speech. In this thesis, joint Rate-Scale features (RS features) are extracted from an auditory model and are
applied to detect the emotion status of noisy speech. The noisy speech is derived from the Berlin Emotional Speech database and the FAU AIBO database with added white and babble noises under various SNR levels. The clean train/noisy test scenario is investigated to simulate conditions with unknown noisy sources. The sequential forward floating selection (SFFS) method is adopted to demonstrate the redundancy of RS features and further dimensionality reduction is conducted. Compared with conventional MFCCs plus prosodic features, RS features show higher recognition rates especially in low SNR conditions on Berlin database. However, both conventional and RS features are over-trained in low SNR conditions on AIBO database. Feature selection or reduction techniques are further required.
中文摘要 ................................................................................................ i
English Abstract .................................................................................... ii
致謝 ...................................................................................................... iii
Contents ................................................................................................ iv
List of Figures ...................................................................................... vi
List of Tables ...................................................................................... viii
Chapter 1 Introduction .......................................................................... 1
1.1. Motivation .............................................................................................................. 1
1.2. Related Works ......................................................................................................... 1
1.3. Experimental Framework ....................................................................................... 3
1.4. Thesis Organization ................................................................................................ 3
Chapter 2 Literature Review ................................................................. 4
2.1. Auditory Model ...................................................................................................... 4
2.1.1. Hearing Physiology .................................................................................... 4
2.1.2. Cochlear Module ........................................................................................ 7
2.1.3. Cortical Module and Rate-Scale Representation ........................................ 8
2.2. Support Vector Machine (SVM) ........................................................................... 10
2.2.1. Separable problem .................................................................................... 10
2.2.2. Binary non-separable problem.................................................................. 13
2.2.3. Nonlinear problem .................................................................................... 14
v
Chapter 3 Database and Feature Extraction ........................................ 16
3.1. Berlin Emotional Speech Database (EMO-DB) ................................................... 16
3.2. FAU AIBO database ............................................................................................. 17
3.3. Rate-Scale (RS) Features ...................................................................................... 19
3.4. MFCC Features .................................................................................................... 20
3.5. Prosodic Features .................................................................................................. 20
3.6. INTERSPEECH 2009 Emotion Challenge Acoustic Features ............................. 22
Chapter 4 Simulation Result ............................................................... 23
4.1. Experimental Setup .............................................................................................. 23
4.2. Results on Berlin Database ................................................................................... 24
4.3. Results on FAU AIBO Database .......................................................................... 38
Chapter 5 Conclusion and Future Works ............................................ 44
5.1. Conclusion ............................................................................................................ 44
5.2. Future Works ........................................................................................................ 45
Reference ............................................................................................. 46
[1] T. Chi, Y. Gao, M. C. Guyton, P. Ru, and S. Shamma, "Spectro-temporal modulation
transfer functions and speech intelligibility," The Journal of the Acoustical Society of
America, vol. 106, p. 2719, 1999.
[2] T. Chi, P. Ru, and S.A. Shamma, “Multi-resolution spectro-temporal analysis of
complex sounds,” J. Acoust. Soc. Am., vol. 118, no. 2, pp. 887-906, 2005.
[3] B. Schuller, G. Rigoll, and M. Lang, “Hidden Markov Model-Based Speech Emotion
Recognition,” Proc. ICASSP, 2003, vol. 2, pp. 1-4.
[4] Dan-Ning Jiang, and Lian-Hong Cai, “Speech Emotion Classification with the
Combination of Statistic Features and Temporal Features”, ICME, 2004, pp.
1967-1970.
[5] V. Ververidis and C. Kotropoulos, “Emotional speech recognition: Resources, features,
and methods,” Speech Comm., vol. 48, no. 9, pp. 1162–1181, September 2006.
[6] Z. Zeng, M. Pantic, G. I. Rosiman, and T. S. Huang, “A survey of affect recognition
methods: Audio, visual, and spontaneous expressions,” IEEE Trans. on Pattern
Analysis and Machine Intelligence, vol. 31, no. 1, pp. 39–58, 2009.
[7] B. Schuller, and G. Rigoll, “Timing Levels in Segment-Based Speech Emotion
Recognition,” Proc. INTERSPEECH 2006, ICSLP, ISCA, pp.1818-1821, Pittsburgh,
PA, 2006.
[8] F Ringeval, and M Chetouani, “A vowel based approach for acted emotion
recognition,” Proc. Interspeech, 2008.
57
[9] B. Schuller, G. Rigoll, and M. Lang, “Speech Emotion Recognition Combining
Acoustic Features and linguistic information in a hybrid support vector machine-belief
network architecture,” Proc. ICASSP, 2004, Vol. I, pp. 577-580.
[10] Feng Yu, Eric Chang, Ying-Qing Xu, and Heung-Yeung Shum, “Emotion detection
from speech to enrich multimedia content,” Proc. IEEE Pacific-Rim Conf. on
Multimedia 2001, Vol. 1, pp. 550–557. 2001.
[11] Tsang-Long Pao, Yu-Te Chen, Jun-Heng Yeh, and Pei-Jia Li, “Mandarin emotional
speech recognition based on SVM and NN,” Proc. of the 18th International
Conference on Pattern Recognition (ICPR’06), vol. 1, September 2006, p. 1096-0.
[12] B. Schuller, D. Arsić, F. Wallhoff, and G. Rigoll, “Emotion Recognition in the Noise
Applying Large Acoustic Feature Sets,” in Proc. Speech Prosody, 2006.
[13] B. Schuller, D. Seppi, A. Batliner, A. Maier, and S. Steidl, “Towards More Reality in
the Recognition of Emotional Speech,” Proc. ICASSP, 2007, Vol. IV, pp. 941-944.
[14] Felix Burkhardt, Astrid Paeschke, Miriam Rolfes, Walter Sendlmeier und Benjamin
Weiss, “A Database of German Emotional Speech”, Proc. Interspeech, Lissabon,
Portugal, 2005, pp. 489-492.
[15] S. Steidl, “Automatic Classification of Emotion-Related User States in Spontaneous
Children’s Speech,” Logos Verlag, Berlin, 2009.
[16] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines,
2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
[17] A. Varga and H.J.M. Steeneken, "Assessment for automatic speech recognition: II.
NOISEX-92: A database and an experiment to study the effect of additive noise
on speech recognition systems," Speech Comm., vol.12(3), pp. 247-251, 1993.
[18] B. Schuller, S. Steidl, and A. Batliner, “The INTERSPEESH 2009 Emotion
Challenge,” Proc. Interspeech, 2009, pp. 312-315.
58
[19] H. Kawahara, Alain de Cheveign´e, H. Banno, T. Takahashi and T. Irino, “Nearly
Defect-free F0 Trajectory Extraction for Expressive Speech Modifications based on
STRAIGHT,” Proc. Interspeech, 2005, pp. 537-540.
[20] F. Eyben, M. Wollmer, B. Schuller (2009): Speech and Music Interpretation by
Large-Space Extraction, http://sourceforge.net/projects/openSMILE.
[21] B. Schuller, M. Wöllmer, F. Eyben, and G. Rigoll, "Spectral or Voice Quality?
Feature Type Relevance for the Discrimination of Emotion Pairs," in The Role of
Prosody in Affective Speech, Linguistic Insights, Studies in Language and
Communication, Vol. 97, Slyvie Hancil (ed.), Peter Lang Publishing Group, ISBN
978-3-03911-696-6, pp. 285-307, 2009.
[22] P. Pudil, F.J. Ferri, J. Novovicova, and J. Kittler, “Floating search methods for feature
selection with nonmonotonic criterion functions,” Proc. international Conference on
Computer Vision & Image Processing, pp. 279-283, 1994.
[23] N. V. Chawla, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer, “SMOTE: Synthetic
Minority Oversampling Technique,” Journal of Artificial Intelligence Research 16, pp.
321-357, 2002.
[24] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of
several methods for balancing machine learning training data,” ACM SIGKDD
Explorations Newsletter, vol. 6 , issue 1, pp. 20 – 29, 2004.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top