臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.217.6) 您好！臺灣時間：2026/06/10 04:10

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

葉藍霙

研究生(外文):

Yeh, Lan-Ying

論文名稱:

使用時頻變化調變於強健語音情緒辨識

論文名稱(外文):

Spectro-Temporal Modulations for Robust Speech Emotion Recognition

指導教授:

冀泰石

指導教授(外文):

Chi, Tai-Shih

學位類別:

碩士

校院名稱:

國立交通大學

系所名稱:

電信工程研究所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2010

畢業學年度:

語文別:

英文

論文頁數:

中文關鍵詞:

情緒、語音、辨識、分類、特徵參數

外文關鍵詞:

emotion、speech、recognition、classification、feature

相關次數:

被引用:0
點閱:414
評分:
下載:34
書目收藏:0

語音情緒的分類是近年來新興的研究題目，目前大多數的研究都著重在乾淨語音中進行分類。在本論文中，我們利用聽覺感知模型提出一種新的時頻變化參數 (joint Rate-Scale features, RS features)，藉由此參數來處理有雜訊情況下的語音情緒辨識的問題。我們將柏林情緒語料庫(Berlin Emotional Database)以及愛寶情緒語料庫(FAU AIBO Database)加入不同訊雜比的白雜訊(white noise)及人聲雜訊(babble noise)，並且以乾淨語料訓練、有雜訊語料測試的方式評估效能，以模擬真實應用中未能事先預知雜訊程度的狀況。我們也進一步使用循序前進浮動搜尋(Sequential Forward Floating Selection, SFFS)來探討所提出特徵參數的冗餘性，以進一步降低所需參數的維度。實驗於柏林情緒語料庫結果顯示，與傳統音韻參數結合梅爾倒頻率係數參數相比，尤其在低訊雜比的情況下，使用時頻變化參數將有更高的辨識率。實驗結果顯示對於愛寶情緒語料庫，在訊雜比很高的情況下，傳統參數和時頻變化參數皆有過度訓練的情況，需要進一步降低維度及改進參數。

Speech emotion recognition is mostly considered in clean speech. In this thesis, joint Rate-Scale features (RS features) are extracted from an auditory model and are
applied to detect the emotion status of noisy speech. The noisy speech is derived from the Berlin Emotional Speech database and the FAU AIBO database with added white and babble noises under various SNR levels. The clean train/noisy test scenario is investigated to simulate conditions with unknown noisy sources. The sequential forward floating selection (SFFS) method is adopted to demonstrate the redundancy of RS features and further dimensionality reduction is conducted. Compared with conventional MFCCs plus prosodic features, RS features show higher recognition rates especially in low SNR conditions on Berlin database. However, both conventional and RS features are over-trained in low SNR conditions on AIBO database. Feature selection or reduction techniques are further required.

中文摘要 ................................................................................................ i
English Abstract .................................................................................... ii
致謝 ...................................................................................................... iii
Contents ................................................................................................ iv
List of Figures ...................................................................................... vi
List of Tables ...................................................................................... viii
Chapter 1 Introduction .......................................................................... 1
1.1. Motivation .............................................................................................................. 1
1.2. Related Works ......................................................................................................... 1
1.3. Experimental Framework ....................................................................................... 3
1.4. Thesis Organization ................................................................................................ 3
Chapter 2 Literature Review ................................................................. 4
2.1. Auditory Model ...................................................................................................... 4
2.1.1. Hearing Physiology .................................................................................... 4
2.1.2. Cochlear Module ........................................................................................ 7
2.1.3. Cortical Module and Rate-Scale Representation ........................................ 8
2.2. Support Vector Machine (SVM) ........................................................................... 10
2.2.1. Separable problem .................................................................................... 10
2.2.2. Binary non-separable problem.................................................................. 13
2.2.3. Nonlinear problem .................................................................................... 14
v
Chapter 3 Database and Feature Extraction ........................................ 16
3.1. Berlin Emotional Speech Database (EMO-DB) ................................................... 16
3.2. FAU AIBO database ............................................................................................. 17
3.3. Rate-Scale (RS) Features ...................................................................................... 19
3.4. MFCC Features .................................................................................................... 20
3.5. Prosodic Features .................................................................................................. 20
3.6. INTERSPEECH 2009 Emotion Challenge Acoustic Features ............................. 22
Chapter 4 Simulation Result ............................................................... 23
4.1. Experimental Setup .............................................................................................. 23
4.2. Results on Berlin Database ................................................................................... 24
4.3. Results on FAU AIBO Database .......................................................................... 38
Chapter 5 Conclusion and Future Works ............................................ 44
5.1. Conclusion ............................................................................................................ 44
5.2. Future Works ........................................................................................................ 45
Reference ............................................................................................. 46

[1] T. Chi, Y. Gao, M. C. Guyton, P. Ru, and S. Shamma, "Spectro-temporal modulation
transfer functions and speech intelligibility," The Journal of the Acoustical Society of
America, vol. 106, p. 2719, 1999.
[2] T. Chi, P. Ru, and S.A. Shamma, “Multi-resolution spectro-temporal analysis of
complex sounds,” J. Acoust. Soc. Am., vol. 118, no. 2, pp. 887-906, 2005.
[3] B. Schuller, G. Rigoll, and M. Lang, “Hidden Markov Model-Based Speech Emotion
Recognition,” Proc. ICASSP, 2003, vol. 2, pp. 1-4.
[4] Dan-Ning Jiang, and Lian-Hong Cai, “Speech Emotion Classification with the
Combination of Statistic Features and Temporal Features”, ICME, 2004, pp.
1967-1970.
[5] V. Ververidis and C. Kotropoulos, “Emotional speech recognition: Resources, features,
and methods,” Speech Comm., vol. 48, no. 9, pp. 1162–1181, September 2006.
[6] Z. Zeng, M. Pantic, G. I. Rosiman, and T. S. Huang, “A survey of affect recognition
methods: Audio, visual, and spontaneous expressions,” IEEE Trans. on Pattern
Analysis and Machine Intelligence, vol. 31, no. 1, pp. 39–58, 2009.
[7] B. Schuller, and G. Rigoll, “Timing Levels in Segment-Based Speech Emotion
Recognition,” Proc. INTERSPEECH 2006, ICSLP, ISCA, pp.1818-1821, Pittsburgh,
PA, 2006.
[8] F Ringeval, and M Chetouani, “A vowel based approach for acted emotion
recognition,” Proc. Interspeech, 2008.
57
[9] B. Schuller, G. Rigoll, and M. Lang, “Speech Emotion Recognition Combining
Acoustic Features and linguistic information in a hybrid support vector machine-belief
network architecture,” Proc. ICASSP, 2004, Vol. I, pp. 577-580.
[10] Feng Yu, Eric Chang, Ying-Qing Xu, and Heung-Yeung Shum, “Emotion detection
from speech to enrich multimedia content,” Proc. IEEE Pacific-Rim Conf. on
Multimedia 2001, Vol. 1, pp. 550–557. 2001.
[11] Tsang-Long Pao, Yu-Te Chen, Jun-Heng Yeh, and Pei-Jia Li, “Mandarin emotional
speech recognition based on SVM and NN,” Proc. of the 18th International
Conference on Pattern Recognition (ICPR’06), vol. 1, September 2006, p. 1096-0.
[12] B. Schuller, D. Arsić, F. Wallhoff, and G. Rigoll, “Emotion Recognition in the Noise
Applying Large Acoustic Feature Sets,” in Proc. Speech Prosody, 2006.
[13] B. Schuller, D. Seppi, A. Batliner, A. Maier, and S. Steidl, “Towards More Reality in
the Recognition of Emotional Speech,” Proc. ICASSP, 2007, Vol. IV, pp. 941-944.
[14] Felix Burkhardt, Astrid Paeschke, Miriam Rolfes, Walter Sendlmeier und Benjamin
Weiss, “A Database of German Emotional Speech”, Proc. Interspeech, Lissabon,
Portugal, 2005, pp. 489-492.
[15] S. Steidl, “Automatic Classification of Emotion-Related User States in Spontaneous
Children’s Speech,” Logos Verlag, Berlin, 2009.
[16] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines,
2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
[17] A. Varga and H.J.M. Steeneken, "Assessment for automatic speech recognition: II.
NOISEX-92: A database and an experiment to study the effect of additive noise
on speech recognition systems," Speech Comm., vol.12(3), pp. 247-251, 1993.
[18] B. Schuller, S. Steidl, and A. Batliner, “The INTERSPEESH 2009 Emotion
Challenge,” Proc. Interspeech, 2009, pp. 312-315.
58
[19] H. Kawahara, Alain de Cheveign´e, H. Banno, T. Takahashi and T. Irino, “Nearly
Defect-free F0 Trajectory Extraction for Expressive Speech Modifications based on
STRAIGHT,” Proc. Interspeech, 2005, pp. 537-540.
[20] F. Eyben, M. Wollmer, B. Schuller (2009): Speech and Music Interpretation by
Large-Space Extraction, http://sourceforge.net/projects/openSMILE.
[21] B. Schuller, M. Wöllmer, F. Eyben, and G. Rigoll, "Spectral or Voice Quality?
Feature Type Relevance for the Discrimination of Emotion Pairs," in The Role of
Prosody in Affective Speech, Linguistic Insights, Studies in Language and
Communication, Vol. 97, Slyvie Hancil (ed.), Peter Lang Publishing Group, ISBN
978-3-03911-696-6, pp. 285-307, 2009.
[22] P. Pudil, F.J. Ferri, J. Novovicova, and J. Kittler, “Floating search methods for feature
selection with nonmonotonic criterion functions,” Proc. international Conference on
Computer Vision & Image Processing, pp. 279-283, 1994.
[23] N. V. Chawla, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer, “SMOTE: Synthetic
Minority Oversampling Technique,” Journal of Artificial Intelligence Research 16, pp.
321-357, 2002.
[24] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of
several methods for balancing machine learning training data,” ACM SIGKDD
Explorations Newsletter, vol. 6 , issue 1, pp. 20 – 29, 2004.

電子全文

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	國語連續語音辨識之聲學模型研究
2.	行動電話造形特徵對於消費者辨識與分類特性的關係研究
3.	WMG：經由多方塊分解之晶圓圖產生器
4.	應用模糊類神經網路於梭織物瑕疵辨識
5.	使用奇異點進行指紋分類
6.	流利國語語音之聲調辨識及其在大字彙辨識上的應用
7.	客語肩肘手指動作詞的語音詞義探討
8.	電話環境下國語語音辨識之強健性問題
9.	視訊中車輛與行人之辨認與追蹤
10.	利用多模式資訊融合技術之新聞分類系統
11.	雜訊環境下國語語音辨認之初步研究
12.	利用機器學習作法之中文意見分析
13.	國語語音辨認中使用狀態整合之改良式聲學模型
14.	語音與特徵值的機率分析
15.	語者情緒指數辨識的研究

1.	陳亞寧，〈開放式資訊取用之現況發展分析〉，圖書與資訊學刊，第51期，2004年11月。
2.	林呈潢、曾品方，〈機構典藏之作者調查研究〉，圖書與資訊學刊，第60期，2007年2月。
3.	吳紹群、吳明德，〈開放資訊取用期刊對學術傳播系統之影響〉，圖書資訊學研究，2卷1期，2007年12月，頁21-54。
4.	李治安、林懿萱，〈從傳統到開放的學術期刊出版：開放近用出版相關問題初探〉，圖書館學與資訊科學，第 33卷1期，2007年4月，頁39-52。
5.	邱炯友、蔣欣樺，〈學術出版傳播之Open Access模式〉，中華民國圖書館學會會報，第74期，頁165-183。

1.	利用主動層優化降低氮化鎵發光二極體效率下降特性之研究
2.	合作式放大傳遞多輸入多輸出中繼系統之強健性Tomlinson-Harashima來源端與線性中繼端前置編碼設計
3.	以非對稱直交分合波器設計跨接耦合器及其雙頻設計
4.	膠體晶體與反蛋白石結構之製作及工程應用
5.	氮化鎵發光二極體成長於奈米柱模板之特性研究
6.	氮化鎵奈米柱製程與雷射特性之研究
7.	以基本面指標看全球金融服務業之持續競爭優勢-以商業銀行為例
8.	異常光學穿透理論與應用之研究
9.	應用於標記部份遮蔽或不均衡光線環境之可調動閥值擴增實境系統
10.	於多用戶多輸入多輸出系統中利用差動量化降低通道回饋量
11.	寬頻180°反射式數位移相器
12.	國客雙語語音辨認
13.	多鐵性正交結構鈥錳氧薄膜之磁電特性研究
14.	以噴射式大氣電漿在低溫下開發高品質二氧化矽應用在低電壓操作之有機薄膜電晶體之研究
15.	電泳沉積法製備固態氧化物燃料電池

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室