跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.106) 您好!臺灣時間:2026/04/05 04:26
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林家興
研究生(外文):Chia-Hsing Lin
論文名稱:基於鑑別式特徵參數求取之強健性聲音事件分類
論文名稱(外文):Discriminative Feature Extraction for Robust Audio Event Classification
指導教授:廖元甫廖元甫引用關係
口試委員:蔡偉和王逸如
口試日期:2010-07-30
學位類別:碩士
校院名稱:國立臺北科技大學
系所名稱:電腦與通訊研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:中文
論文頁數:49
中文關鍵詞:聲音事件分類賈柏濾波器語料驅動
外文關鍵詞:audio event classificationgabor filterdata driven
相關次數:
  • 被引用被引用:0
  • 點閱點閱:298
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
非語音的事件聲音,在某些特定環境下是相當有意義的資訊。本論文主要探討在非語音聲音事件分類方面,除了普遍使用的梅爾倒頻譜參數之外,是否有對於非語音的音訊更能顯現出其特別資訊的音訊特徵參數,以及能增加在雜訊環境下辨識效能的參數組合。因此,我們考慮以時頻分析和圖樣特徵的概念來擷取出音訊特徵參數,所以我們將使用賈柏濾波器參數,或利用主成分分析和線性鑑別分析等分析方法求出語料驅動濾波器參數,當作我們的新類型音訊特徵參數,最後再運用最小分類錯誤法則對已得的音訊特徵參數做微調,希望能求取出更具有鑑別力的音訊特徵參數。
實驗用的語料是 RWCP (Real World Computing Partnership) 中的105種乾淨的事件聲音,在加入Aurora 2複合情境模式的雜訊之後,使用我們設計的音訊特徵參數去訓練模型及進行測試。在實驗之後發現,我們求取出的新音訊特徵參數比起傳統音訊特徵參數的分類錯誤率從4.13%降低到3.17%,因此我們採用新類型音訊特徵參數的系統架構確實能對於聲音事件分類達到強健性的效果,也能確認新類型音訊特徵參數對於非語音訊號的適用性。


In Tradition, audio event classification relies heavily on MFCCs (Mel-Frequency Cepstral Coefficients) features. However, MFCCs is originally designed for automatic speech recognition. It is not sure whether MFCCs are still the best features for audio event classification or not. Besides, MFCCs are usually not so robust in noisy environment. Therefore, in this paper, several new feature extraction methods are proposed in the hope of getting better performance and robustness than MFCCs in noisy conditions.
The proposed feature extraction methods are mainly based on the concept of match filters in spectro-temporal domain. Several methods to design the set of match filters are proposed including handmade gabor filters and three data-driven filters using PCA (Principle Component Analysis), LDA-based Eigen-space analysis (Linear Discriminative Analysis) and MCE (Minimum Classification Error) training.
The robustness of the proposed method is evaluated on RWCP (Real World Computing Partnership) database with artificially added noise. There are 105 different audio events in RWCP. The experimental settings are similar to Aurora 2 multi-condition training task. Experimental results show that the lowest average error rate of 3.17% was achieved by MCE method and is superior to conventional MFCCs (4.13%). We thus confirm the superiority and robustness of the proposed audio feature extraction approaches.


中文摘要 i
ABSTRACT ii
誌 謝 iv
目 錄 v
表目錄 vii
圖目錄 viii
第一章 緒論 1
1.1 研究動機與問題背景 1
1.2 研究方法 3
1.2.1 參考系統與傳統音訊特徵參數擷取 3
1.2.2 新音訊特徵參數擷取 4
1.3 主要貢獻 5
1.4 章節概要 6
第二章 傳統音訊特徵參數擷取 7
2.1 梅爾倒頻譜參數 7
2.2 位移差分化倒頻譜參數 9
2.3 正規化及ARMA濾波器 10
第三章 新音訊特徵參數擷取 13
3.1 賈柏濾波器 14
3.1.1 基本理論 14
3.1.2 賈柏濾波器參數應用於聲音事件分類 15
3.2 語料驅動濾波器 17
3.2.1 語料驅動濾波器實作 17
3.2.2 主成分分析 18
3.2.3 線性鑑別分析 20
3.2.4 語料驅動濾波器應用於聲音事件分類 23
第四章 基於最小分類錯誤法之新音訊特徵參數擷取 24
4.1 最小分類錯誤法準則 25
4.2 轉換矩陣最佳化演算法 26
4.3 最小分類錯誤法準則應用於聲音事件分類 28
第五章 實驗結果與分析 29
5.1 實驗語料庫與實驗設定 29
5.1.1 語料庫簡介 29
5.1.2 實驗設定 30
5.1.2.1 隱藏式馬可夫模型 30
5.1.2.2 傳統音訊特徵參數實驗設定 32
5.1.2.3 新音訊特徵參數實驗設定 32
5.1.2.4 最小分類錯誤法調適音訊特徵參數實驗設定 34
5.2 實驗結果 34
5.2.1 傳統音訊特徵參數辨識結果 34
5.2.2 新音訊特徵參數辨識結果 35
5.2.3 最小分類錯誤法調適音訊特徵參數辨識結果 37
5.3 實驗結果比較與分析 38
第六章 結論與未來展望 46
參考文獻 47
附 錄 49



[1]L. Kennedy and D. Ellis, “Laughter detection in meetings,” in NIST ICASSP Meeting Recognition Workshop, Montreal, Canada, May 2004, pp. 118-121
[2]J. Pinquier, J. L. Rouas, and R. Andrè-Obrecht, “Robust speech/music classification in audio documents,” in Proc. ICSLP, Denver, USA, Sept. 2002, vol. 3, pp. 2005-2008
[3]M. Vacher, D. Istrate, and J. F. Serigna, “Sound detection and classification through transient models using wavelet coefficient trees,” in EUSIPCO, Vienna, Austria, 2004, pp. 1171-1174
[4]L. Gerosa, G. Valenzise, F. Antonacci, M. Tagliasacchi, and A. Sarti, “Scream and gunshot detection in noisy environments,” in EURASIP European Signal Processing Conference, Poznan, Poland, Sept. 2007
[5]C. Cleval, T. Ehrette, and G. Richard, “Events detection for an audio-based surveillance system,” in Proc. ICME’05, Orsay, France, July 2005, pp. 1306-1309
[6]R. Radhakrishnan, A. Divakaran, and A. Smaragdis, “Audio analysis for surveillance applications,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct. 2005, pp. 158-161
[7]Z. Xiong, R. Radhakrishnan, A. Divakaran, and T. S. Huang, “Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework,” in ICME’03, Baltimore, USA, July 2003, vol. 3, pp. 401-404
[8]M. Slaney, “Mixtures of probability experts for audio retrieval and indexing,” in ICME’02, Ischia, Italy, July 2002, vol. 1, pp. 345-348
[9]T. Zhang and C. Kuo, “Hierarchical system for content-based audio classification and retrieval,” Conference on Multimedia Storage and Archiving Systems Ⅲ, SPIE, Oct. 1998, vol. 3527, pp. 398-409
[10]W. Huang, S. Lau, T. Tan, L. Li, and L. Wyse, “Audio events classification using hierarchical structure,” in Information, Communication and Signal Processing, Singapore, Dec. 2003, vol. 3, pp. 1299-1303
[11]A. Temko, “CLEAR 2007 AED evaluation plan,” http://isl.ira.uka.de/clear07, 2007
[12]A. Temko, C. Nadeu, and J. I. Biel, “UPC’s acoustic event detection system and results in the CLEAR’07 evaluation,” Internal UPC report, 2005
[13]A. Temko, R. Malkin, C. Zieger, D. Macho, C. Nadeu, and M. Omologo, “CLEAR evaluation of acoustic event detection and classification systems,” CLEAR’06 Evaluation Campaign and Workshop, Southampton, LNCS, Jan. 2007, vol. 4122, pp. 311-322
[14]X. Zhuang, X. Zhou, T. S. Huang and M. Hasegawa-Johnson, “Feature analysis and selection for acoustic event detection,” in Proc. ICASSP’08, Las Vegas, USA, Apr. 2008, pp. 17-20
[15]K. Schutte and J. Glass, “Speech recognition with localized time-frequency pattern detectors,” in Proc. ASRU, Kyoto, Japan, Dec. 2007, pp. 341-346
[16]M. Kleinschmidt and D. Gelbart, “Improving word accuracy with Gabor feature extraction,” in Proc. ICSLP, 2002
[17]B. H. Juang, W. Chou, and C. H. Lee, “Minimum classification error rate methods for speech recognition,” in IEEE Trans. on Speech and Audio Processing, May 1997, vol. 5, no. 3, pp. 257-265
[18]Á. Torre, A. M. Peinado, A. J. Rubio, and P. García, “Discriminative feature extraction for speech recognition in noise,” in Proc. EuroSpeech, 1997


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top