跳到主要內容

臺灣博碩士論文加值系統

(35.153.100.128) 您好!臺灣時間:2022/01/19 04:13
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張柏雄
研究生(外文):Bor-Hsiung Chang
論文名稱:中文語音情緒之自動辨識
論文名稱(外文):Automated Recognition of Emotion in Mandarin
指導教授:周榮華周榮華引用關係
指導教授(外文):Jung-Hua Chou
學位類別:碩士
校院名稱:國立成功大學
系所名稱:工程科學系碩博士班
學門:工程學門
學類:綜合工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:65
中文關鍵詞:非線性頻率轉換情緒語音辨識Mel頻率刻度動態扭曲演算法LBG演算法標準參考樣本
外文關鍵詞:dynamic time warpingLBGreference patternemotion recognizermel frequency
相關次數:
  • 被引用被引用:10
  • 點閱點閱:3390
  • 評分評分:
  • 下載下載:745
  • 收藏至我的研究室書目清單書目收藏:2
本文提出一個情緒語音辨識系統。情緒的分類包含了正常 (Normal)、生氣(Anger)、厭煩(Boredom)、快樂(Happiness)和悲傷(Sadness)。一段情緒語音信號經傅利葉轉換之後所求得之頻譜參數,透過非線性濾波器組,轉換成以Mel 頻率刻度為基礎之能量參數向量,再經LBG 演算法量化成固定長度之特徵向量,並經修正型強健演算法,訓練出各個情緒語音特徵向量。利用中文語音以母音為基礎之特性,對訓練所得到的特徵向量做情緒特徵之強化,以作為在不同情緒下之語音標準參考樣本(Standard Reference Patterns),接著利用動態扭曲演算法,計算出測試樣本與參考樣本間之最小距離,以其能達到情緒精確辨識之結果。本論文中所使用的資料庫,由兩位女性語者在不同情緒下,表達出12 種不同字數的句型,一共有591 句。
經實驗分析結果,本系統之平均辨識率可達約50﹪,兩個語者之平
均辨識率分別為51﹪和46﹪。
This study proposes an emotion recognizer in Mandarin speech. Five human emotions embedded in speech including normal, anger, boredom, happiness and sadness are investigated. The speech spectrum was calculated using FFT first. Then, a set of 19 Mel scaled filter banks was applied to the FFT power spectrum. The feature vector based on Mel frequency power coefficients was extracted. Afterwards, the vector for each speech frame was assigned to a cluster by vector quantization. The vector quantizer design based on LBG algorithm was adopted. A modified robust training method was used to train the emotion-specific reference patterns. In addition, enhanced emotion features for each reference patterns were performed. Finally, the minimum distance between reference patterns and test pattern was computed by DTW (Dynamic time warping) method to obtain the recognition result. The corpus consists of 591 emotional utterances from two female speakers.
The results show that the emotion patterns can be recognized fairly. A total average accuracy of 50% is achieved. More specifically, an average accuracy of 51% and 46% are obtained for two female speakers respectively.
中文摘要....i
英文摘要....ii
致謝....iii
目錄....iv
圖目錄....ix
表目錄....xi
第一章導論....1
1.1 簡介....1
1.2 文獻回顧....2
1.2.1 情緒理論相關研究....3
1.2.2 語韻特徵情緒分析相關研究....4
1.2.3 頻譜特徵情緒分析相關研究....6
1.2.4 其他的相關研究....7
1.3 研究動機與目的....8
第二章情緒語音模型之建立....9
2.1 情緒語音辨識系統....9
2.2 特徵分析之前處理....11
2.2.1 預強調....11
2.2.2 音框切割....13
2.2.3 視窗化....14
2.2.4 快速傅利葉轉換....15
2.2.5 基礎頻率偵測....16
2.3 小結....18
第三章Mel 頻率之特徵參數擷取....20
3.1 Mel 頻率刻度之轉換....20
3.2 非線性濾波器組設計....23
3.3 能量特徵參數擷取....24
3.3.1 快速傅利葉轉換頻譜....25
3.3.2 計算濾波器之對數能量....25
3.4 小結....27
第四章向量量化與樣本比對演算法....28
4.1 前言....28
4.2 向量量化之原理....28
4.3 向量量化之設計方法....29
4.3.1 LBG 演算法....30
4.3.2 向量編碼演算法之實現....31
4.4 參考樣本訓練方法....33
4.4.1 修正型強健訓練法....34
4.4.2 情緒特徵之強化....35
4.5 樣本比較演算法....38
4.6 線性時間扭曲演算法....38
4.7 動態規劃原理....39
4.8 動態扭曲演算法....41
4.9 小結....43
第五章辨識系統之實現與實驗結果....44
5.1 語音資料庫之建立....44
5.2 辨識系統之實現....47
5.2.1 訓練階段....47
5.2.2 辨識階段....49
5.3 實驗結果與討論....50
5.3.1 實驗結果....50
5.3.2 討論....52
第六章結論與未來展望....55
6.1 結論....55
6.2 未來展望....56
參考文獻....58
附錄....63
自述....65
參考文獻
[1] Cannon, W.B., “Again the James-Lange theory of emotion: a critical examination and an alternative theory”, Am. J. Psychol., 39,106-24, 1931.
[2] Strongman K.T.著,游恆山譯,「情緒心理學」,五洲發行, 文笙總經銷,民76,台北巿。
[3] Cornelius R.R., The science of emotion. Research and tradition in the psychology of emotion Upper Saddle River (NJ): Prentice-Hall, 1996.
[4] Cornelius R.R., “THEORETICAL APPROACHES TO EMOTION ”, ISCA Workshop on Speech and Emotion, Vassar College, Poughkeepsie, NY USA, 2000.
[5] Picard R.W., Vyzas E., and Healey J., “Toward Machine Emotional Intelligence: Analysis of Affective Physiological State”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, no. 10, October 2002.
[6] Pereira C., “Dimension of emotional meaning in speech”, ISCA Workshop on Speech and Emotion, Speech, Hearing and Language Research Centre, Macquarie University, Australia, 2000.
[7] Davitz, J.R., “Auditory correlates of vocal expression of emotional feeling. In the communication of emotional meaning”, New York: McGraw-Hill, 1964.
[8] Mozziconacci, S.J.L, “Speech variability and emotion: Production and perception.” Ph.D. thesis, indhoven, The Netherlands, 1998.
[9] Iida A., Campbell N., Iga S., Higuchi F., Higuchi F., and Yasumura M., “A Speech Synthesis System with Emotion for Assisting Communication”, ISCA Workshop on Speech and Emotion, Keio Research Institute at SFC, Keio University, ATR Information Sciences Division, 2000.
[10] Paeschke A., and Sendlmeier W. F., ”Prosodic Characteristics of Emotional Speech: Measurement of Fundamental Frequency Movements” ISCA Workshop on Speech and Emotion, Technical University Berlin, Germany, 2000.
[11] Amir N., Ron S., and Laor N., “Analysis of an emotional speech cprpus in Hebrew based on objective criteria”, ISCA Workshop on Speech and Emotion, Holon Academic Institute of Technology, Holon, Israel, 2000.
[12] Roach P., “Techniques for the Phonetic Description of
Emotional Speech”, ICSA Workshop on Speech and Emotion, School of Linguistics and Applied Language Studies, University of Reading, U.K., 2000.
[13] Nicholson J., Takahashi K., and Nakatsu R., “Emotion recognition in speech using neural networks”, ATR Media Integration & Communications Research Lab Neural Information Processing, 1999. Proceedings. ICONIP '99. 6th
International Conference on, Volume: 2, 1999.
[14] Yamada T., Hashimoto H., and Tosa N., “Pattern recognition of emotion with Neural Network”, Proceedings of the 1995 IEEE IECON 21st International Conference on, Volume: 1, 1995.
[15] Polzin T.S., “Detecting Verbal and Non-verbal Cues in the Communication of Emotion” Ph.D. thesis, Carnegie Mellon University, 1998.
[16] Fukuda S., and Kostov V., ”Extracting emotion from voice”, IEEE International Conference on Systems, Man, and Cs, 1999.
[17] Nwe T.L., and Wei F.S., ”Speech Based Emotion Classification”, Electrical and Electronic Technology, 2001. TENCON. Proceedings of IEEE Region 10 International Conference on, Volume: 1, 2001.
[18] Sato J., and Morishima S., “Emotion modeling in speech production using emotion space”, Faculty of Engineering, Seikei University, IEEE International Workshop on Robot and Human Communication, 1996.
[19] Hanson J.H.L., “Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect”, IEEE Trans. Speech Audio Processing, vol.2, Oct. 1994.
[20] Zhou G., Hansen H.L., and Kaiser J.F., “Nonlinear Feature Based Classification of Speech Under Stress”, IEEE Transaction on Speech and Audio Processing, vol. 9, no.3, March, 2001.
[21] Hansen J.H.L., and Womack B.D., “Feature Analysis and Neural-Based Classification of Speech Under Stress”, IEEE Transactions on Speech and Audio Processing, vol. 4, no. 4, July 1996.
[22] Rabiner L., and Juang B.H., Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, N.J,1993.
[23] 王玉? 著,「以內容為基礎的音頻信號之切割與分類之研究」,國立成功大學碩士論文,民國90 年。
[24] Rabiner L., and Schafer R., Digital processing of speech signals, Prentice-Hall, Inc., N.J, 1978.
[25] Holmes J.N, Speech Synthesis and Recognition, Van Nostrand Reinhold (UK) Co. Ltd, Molly Millars Lane, England, 1988.
[26] V.Opperheim A., and Schafer R.W., Discrete-time Signal Processing, Prentice-Hall, Upper Saddle River, N.J, 1999.
[27] Nwe T.L., and Wei F.S., ”Speech Based Emotion Classification”, Electrical and Electronic Technology, 2001. TENCON. Proceedings of IEEE Region 10 International
Conference on, Volume: 1, 2001.
[28] Gold B., and Morgan N., Speech and Audio Signal Processing: Processing and Perception of Speech and Music, John Wiley & Sons, Inc., N.Y, 2000.
[29] Linde Y., Buzo A., and GrayR.M., “ An algorithm for vector quantizer design”, IEEE Transactions on Communications, 28:84-95, January 1980.
[30] Gersho A. , and Gray R.M., Vector Quantization and Signal Compression, Kluwer Academic Publishers, Boston, 1992.
[31] 戴顯權編著,「資料壓縮」,松崗發行,1996 年3 月,台北。
[32] Ney H., “The Use of One-Stage Dynamic Programming Algorithm for Connected Word Recognition”, IEEE Trans. On Acoustics, Speech, and Signal Processing, vol. ASSP-32, NO. 2, April 1984.
[33] 陳明熒著,「PC 電腦語音辨認實作」,旗標出版,民83, 台北市。
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top