跳到主要內容

臺灣博碩士論文加值系統

(44.192.95.161) 您好!臺灣時間:2024/10/04 14:30
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:廖俊祺
研究生(外文):Liao, Jyun-Ci
論文名稱:基於調制頻譜向量之環境聲響事件分類
論文名稱(外文):Environmental Sound Event Classification Based on Modulation Spectral Vectors
指導教授:劉奕汶
指導教授(外文):Liu, Yi-Wen
口試委員:黃元豪黃朝宗
口試委員(外文):Huang, Yuan-HaoHuang, Chao-Tsung
口試日期:2017-07-27
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:中文
論文頁數:48
中文關鍵詞:調制頻譜向量噪音訓練環境聲響事件高斯混合模型
外文關鍵詞:modulation spectral vectorsnoisy trainingenvironment sound eventgmm
相關次數:
  • 被引用被引用:0
  • 點閱點閱:171
  • 評分評分:
  • 下載下載:2
  • 收藏至我的研究室書目清單書目收藏:0
高斯混和模型運用在語音、聲響辨識系統方展成熟,然而在高度的環境背景雜訊,其辨識效果會大幅下降。本論文提出結合短時間與長時間的特徵萃取向量,提昇環境在高度背景雜訊下的辨識率。短時間特徵向量採用梅爾倒頻譜係數(Mel-frequency cepstral coefficients, MFCCs),長時間特徵係數採用調制頻譜向量(Modulation spectral vectors, MSVs),調制頻譜特徵向量可以萃取訊號在頻率域的能量包絡,能量包絡的特性能夠有效的抵抗環境雜訊的干擾。
為了讓系統對於雜訊更加強健,本論文提一種訓練方式,在訓練的過程中,高斯混合模型就先看過含有雜訊的資料,這種方式有助於提升在低訊雜比的訊號辨識率。本論文進行辨識8種類別的室內環境聲響事件,在訊雜比 0 dB 的環境下,辨識率達八成以上。
The Gaussian mixture model (GMM) has developed well both in the speech and sound recognition, but it does not perform well in the high background noisy environment. This thesis proposes a method combining short-term and long-term features to overcome this issue. Here the short-term features are Mel-frequency cepstral coefficients (MFCCs) and the long-term features are the modulation spectral vectors (MSVs) calculated in the frequency domain. The MSVs contains the envelope message of signals which is a good feature against high noise.
For robustness against noise, this thesis proposes a method to learn noisy data while training on GMMs. This method could raise the recognition accuracy in the low singal-to-noise ratio (SNR) case. The method was evaluated on a database which consists of 8 different indoor sound event classes. It achieves > 80 % accuracy at 0 dB SNR.
摘要 [iii]
Abstract [v]
誌謝 [vii]

{1}緒論 [1]
{1.1}研究動機 [1]
{1.2}文獻回顧 [2]
{1.3}研究方向 [4]

{2}系統架構與方法 [5]
{2.1}訊號預處理與噪音訓練 [6]
{2.2}梅爾倒頻譜係數(Mel-frequency cepstral coefficients, MFCCs) [7]
{2.3}調制頻譜向量(Modulation Spectral Vectors, MSVs) [12]
{2.3.1}調制頻譜向量單位(Unit of Modulation Spectral Vectors) [14]
{2.3.2}長時間向量萃取(Long-term Feature Extraction) [14]
{2.4}特徵向量分析 [16]
{2.4.1}自相關(Autocorrelation) [16]
{2.4.2}調制頻譜向量之和(Sum of Modulation Spectral Vectors) [17]
{2.5}高斯混合模型(Gaussian mixture models, GMMs) [20]
{2.5.1}模型描述 [20]
{2.5.2}模型參數的初始化 [20]
{2.5.3}期望值最大演算法(Expectation Maximization Algorithm, EM Algorithm) [21]
{2.5.4}高斯混合模型之訓練流程 [26]

{3}分析與討論 [29]
{3.1}單一聲響事件資料庫 [29]
{3.2}效能評估 [30]
{3.3}短時間與長時間特徵向量之比較 [31]
{3.4}噪音訓練之辨識結果 [33]
{3.5}調制頻譜向量之和之辨識結果 [37]
{3.6}雜訊種類之討論 [40]

{4}結論與未來展望 [45]

參考文獻 [47]
[1] B. E. Kingsbury, N. Morgan, and S. Greenberg, “Robust speech recognition using the modulation spectrogram,” Speech communication, vol. 25, no. 1, pp. 117–132,
1998.
[2] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Transactions on speech and audio processing, vol. 10, no. 5, pp. 293–302, 2002.
[3] F. Morchen, A. Ultsch, M. Thies, and I. Lohken, “Modeling timbre distance with temporal statistics from polyphonic music,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 1, pp. 81–90, 2006.
[4] D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, “Detection and classification of acoustic scenes and events,” IEEE Transactions on Multimedia, vol. 17, no. 10, pp. 1733–1746, 2015.
[5] D. Barchiesi, D. Giannoulis, D. Stowell, and M. D. Plumbley, “Acoustic scene classification: Classifying environments from the sounds they produce,” IEEE Signal Processing Magazine, vol. 32, no. 3, pp. 16–34, 2015.
[6] M. H. Moattar and M. M. Homayounpour, “A review on speaker diarization systems and approaches,” Speech Communication, vol. 54, no. 10, pp. 1065–1103, 2012.
[7] M. Mckinney and J. Breebaart, “Features for audio and music classification,” in Proceedings of the International Symposium on Music Information Retrieval, pp. 151–158, 2003.
[8] M. A. Hossan, S. Memon, and M. A. Gregory, “A novel approach for mfcc feature extraction,” in Signal Processing and Communication Systems (ICSPCS), 2010 4th International Conference on, pp. 1–5, IEEE, 2010.
[9] S. Greenberg and B. E. Kingsbury, “The modulation spectrogram: In pursuit of an invariant representation of speech,” in Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, vol. 3, pp. 1647–1650, IEEE, 1997.
[10] S. Sukittanon, L. E. Atlas, and J. W. Pitton, “Modulation-scale analysis for content identification,” IEEE Transactions on Signal Processing, vol. 52, no. 10, pp. 3023–3035, 2004.
[11] 何育澤, “基於支持向量機之混合聲響辦認,” 國立清華大學, 2014 年.
[12] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443–445, 1985.
[13] A. Oppenheim and R. Schafer, Discrete-time Signal Processing. Prentice-Hall signal processing series, Pearson, 2010.
[14] H. Hermansky, “Modulation spectrum in speech processing,” in Signal Analysis and Prediction, pp. 395–406, Springer, 1998.
[15] M. Markaki and Y. Stylianou, “Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features,” Speech Communication, vol. 53, no. 5, pp. 726–735, 2011.
[16] L. Atlas and S. A. Shamma, “Joint acoustic and modulation frequency,” EURASIP Journal on Advances in Signal Processing, vol. 2003, no. 7, p. 310290, 2003.
[17] G. Evangelopoulos and P. Maragos, “Multiband modulation energy tracking for noisy speech detection,” IEEE Transactions on audio, speech, and language processing, vol. 14, no. 6, pp. 2024–2038, 2006.
[18] J.-H. Bach, B. Kollmeier, and J. Anemüller, “Modulation-based detection of speech in real background noise: Generalization to novel background classes,” in Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 41–44, IEEE, 2010.
[19] N. H. Sephus, A. D. Lanterman, and D. V. Anderson, “Exploring frequency modulation features and resolution in the modulation spectrum,” in Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE), 2013 IEEE, pp. 169–174, IEEE, 2013.
[20] J.-M. Ren and J.-S. R. Jang, “Discovering time-constrained sequential patterns for music genre classification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 4, pp. 1134–1144, 2012.
[21] C.-H. Lee, J.-L. Shih, K.-M. Yu, and H.-S. Lin, “Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features,” IEEE Transactions on Multimedia, vol. 11, no. 4, pp. 670–682, 2009.
[22] M. Markaki and Y. Stylianou, “Voice pathology detection and discrimination based on modulation spectral features,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 1938–1948, 2011.
[23] S.-C. Lim, S.-J. Jang, S.-P. Lee, and M. Y. Kim, “Music genre/mood classification using a feature-based modulation spectrum,” in Mobile IT Convergence (ICMIC), 2011 International Conference on, pp. 133–136, IEEE, 2011.
[24] D. Reynolds, “Gaussian mixture models,” Encyclopedia of biometrics, pp. 827–832, 2015.
[25] D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using gaussian mixture speaker models,” IEEE transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72–83, 1995.
[26] K. B. Petersen, M. S. Pedersen, et al., “The matrix cookbook,” Technical University of Denmark, vol. 7, p. 15, 2008.
[27] A. B. Downey, Think complexity: complexity science and computational modeling, ch. 9, p. 91. O’Reilly Media, Inc., 2012.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top