跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.169) 您好!臺灣時間:2025/02/18 21:56
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林文琦
研究生(外文):Wen-chi Lin
論文名稱:使用離散餘弦轉換處理動態特徵之強健性語音辨認
論文名稱(外文):DCT-based Processing of Dynamic Features for Robust Speech Recognition
指導教授:洪志偉洪志偉引用關係
指導教授(外文):Jeih-weih Hung
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:英文
論文頁數:44
中文關鍵詞:自動語音辨識離散餘弦轉換時間濾波器自動語音辨識離散餘弦轉換時間濾波器
外文關鍵詞:automatic speech recognitiondiscrete cosine transformtemporal filter
相關次數:
  • 被引用被引用:0
  • 點閱點閱:238
  • 評分評分:
  • 下載下載:54
  • 收藏至我的研究室書目清單書目收藏:0
離散餘弦轉換已經被廣泛地使用在資料壓縮及語音辨識上。在此篇論文中,我們以離散餘弦轉換為基礎發展出一系列的語音特徵時間濾波器,藉此改進語音特徵的環境強健性。藉由分析離散餘弦轉換係數經過窗化後所得之新型濾波器,可濾出有效的語音成份,產生更強健性的語音動態特徵。藉著比較實驗結果,我們進而得知如何找出最佳的濾波器組合。在這些新型濾波器上,我們亦分析不同長度的濾波器係數,對於過濾後語音特徵強
健性的影響。
本論文中的所有實驗是作在歐洲電信標準協會所發行的Aurora 2 語料庫,實驗結果顯示新發展的濾波器可以有效提升語音辨識精確率。例如,在提出的三種修正型離散餘弦濾波器處理的實驗中,其平均辨識率比梅爾倒頻譜特徵分別進步了6.33%、7.83%與8.71%。比原始未處理的離散餘弦濾波器法而言,也可進步4.84%、6.34%與7.22%。由此可知所新發展的濾波器技術可改善語音特徵的加成性。
此外,我們將提出的四種新方法與正規化法結合,包含了平均值消去法及平均值與變異數正規化法。結合後的新方法與平均值消去法與平均值與變異數正規化法之平均辨識率比梅爾倒頻譜特徵之辨識率分別可進步約6% 與18%。
In this thesis, we explore the various properties of cepstral time coefficients (CTC) in speech recognition, and then propose several methods to refine the CTC construction process. It is found that CTC are the filtered version of mel-frequency cepstral coefficients (MFCC), and the used filters are from the discrete cosine transform (DCT) matrix. We modify these DCT-based filters by windowing, removing DC gain, and varying the filter length. The speech recognition task using Aurora-2 digit database show that the proposed methods can enhance the original CTC in improving the recognition accuracy. The resulting relative error reduction is around 20%.
誌謝 i
中文摘要 ii
Abstract iii
Contents iv
List of Figures vi
List of Tables viii
1 Introduction 1
1.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Brief Introduction of the Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Framework of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Brief Overview of Some DCT-based Robustness Techniques 4
2.1 DCT for Feature Coding Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 DCT for Capturing Both the Spectral and Modulation Information 6
2.3 Cepstral Time Coefficients from DCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Brief Introduction of Discrete Cosine Transform 9
3.1 The Relationship Between DFT and DCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Properties of DCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
3.2.2 Energy compaction . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.3 Decorrelation . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Using discrete cosine transform for temporal ltering . . . . . . . . . . . . . . . . . . . . . . .13
4 The New DCT-related Temporal Filtering Techniques 14
4.1 The Analysis of the CTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Windowing the DCT-based Filter Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Varying the Filter Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 The Recognition Experiment Results and Discussions 23
5.1 The Experimental Environmental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
5.2 Experimental Results of MFCC and CTC Features . . . . . . . . . . . . . . . . . . . . . . . 25
5.2.1 The experimental results of MFCC (Baseline) . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2.2 The experimental result of the original CTC features . . . . . . . . . . . . . . . . . . . . 26
5.3 The Experimental Results of the Proposed New DCT-based Filtering Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3.1 The results for the windowed DCT-based filters with fixed filter length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3.2 The results for the windowed DCT-based filters with varying filter length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4 The Experimental Results of Combining the Proposed New DCT-based
Filtering Approaches with Statistics Normalization Methods . . . . . . . . . . . . . . . . . . 37
6 Conclusion and future work 39
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2 The Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
References 41
[1] S. N. Koh, I. Y. Soon and C. K. Yeo, Noisy speech enhancement using discrete
cosine transform," Speech Communication, vol. 24, no. 3, pp. 249-257, 1998.
[2] J. Yeh and C. Chen, Noise-robust speech features based on cepstral time coefficients," Conference on Computational Linguistics and Speech Processing (RO-
CLING 2009), pp. 31-38, 2009.
[3] N. Kanedera, T. Arai, H. Hermansky and M. Pavel, On the relative importance
of various components of the modulation spectrum for automatic speech recognition,"
Speech Communication, vol. 28, no. 1, pp. 43-55, 1999.
[4] J. Hung and L. Lee, Optimization of temporal filters for constructing robust features in speech recognition," IEEE Transactions on Audio, Speech and Language
Processing, vol. 14, no. 3, pp. 808-832, 2006.
[5] H. Hermansky and N. Morgan, RASTA processing of speech," IEEE Transactions
on Speech and Audio Processing, vol. 2, no. 4, pp. 578-589, 1994.
[6] N. Kanedera, T. Arai, H. Hermansky and M. Pavel, On the importance of various
modulation frequencies for speech recognition," European Conference on Speech
Communication and Technology (EUROSPEECH), pp. 1079-1082, 1997.
[7] Y. Hu and C. Loizou, Speech enhancement based on wavelet thresholding the
multitaper spectrum," IEEE Transactions on Speech and Audio Processing, vol.
12, no. 1, pp. 59-67, 2004.
[8] G. Doblinger, Computationally efficient speech enhancement by spectral minima
tracking in sub-bands," European Conference on Speech Communication and
Technology (EUROSPEECH), pp. 1513-1516, 1995.
[9] S. Salahuddin, S. Z. Al Islam, M. K. Hasan and M. R. Khan, Soft thresholding
for DCT speech enhancement," Electronics Letters, vol. 38, no. 24, pp. 1605-1607,
2002.
[10] C. Kwong, W. Pang, H. Wu, K. Ho, Simple DCT-based speech coder for internet
applications," IEEE International Conference on Communications, vol. 1, pp. 344-
348, 1999.
[11] U. Guz, H. Gurkan and B. S. Yarman, A novel noise robust and low bit rate
speech coding algorithm," International Symposium on Computer and Information
Sciences (ISCIS 2009), pp. 471-474, 2009.
[12] B. Milner and X. Shao, Low bit-rate feature vector compression using transform
coding and non-uniform bit allocation," IEEE International Conference on Acous-
tics, Speech and Signal Processing (ICASSP 2003), vol. 2, pp. 129-132, 2003.
[13] M. Y. Azar and F. Razzazi, A DCT based nonlinear predictive coding for feature
extraction in speech recognition systems," IEEE International Conference on
Computational Intelligence for Measurement Systems and Applications (CIMSA
2008), pp. 19-22, 2008.
[14] Q. Zhu and A. Alwan, An efficient and scalable 2D DCT-based feature coding
scheme for remote speech recognition," IEEE International Conference on Acous-
tics, Speech and Signal Processing (ICASSP 2001), vol. 1, pp. 113-116, 2001.
[15] S. A. Zahorian, H. Hu, Z. Chen and J. Wu, Spectral and temporal modulation
features for phonetic recognition," International Speech Communication Associa-
tion (INTERSPEECH), pp. 1071-1074, 2009.
[16] N. Ahmed, T. Natarajan and K. R. Rao, Discrete cosine transform," IEEE Trans-
actions on Computers, vol. 23, no. 1, pp. 90-93, 1974.
[17] H. Ding and I. Soon, An Adaptive time-shift analysis for DCT based speech enhancement,"
International Conference on Information, Communications and Sig-
nal Processing (ICICS 2009), pp. 1-4, 2009.
[18] M. T. Heideman, Computation of an odd-length DCT from a real-valued DFT
of the same length," IEEE Transactions on Signal Processing, vol. 40, no. 1, pp.
54-61, 1992.
[19] S. K. Mitra, Digital signal processing: a computer-based approach," McGraw-Hill
Companies, Inc., 2006.
[20] S. A. Khayam, The discrete cosine transform (DCT): theory and application,"
Technical Report WAVES-TR-ECE802.602, 2003.
[21] G. Strang, The discrete cosine transform," SLAM Review, vol. 41, no. 1, pp.
135-147, 1999.
[22] G. Aggarwal and D. Gajski, Exploring DCT implementations," Technical Report
UCI-ICS-98-10, 1998.
[23] J. F. Blinn, What's the deal with the DCT?," IEEE Computer Graphics and
Applications, vol. 13, no. 4, pp. 78-83, 1993.
[24] S. Furui, Speaker independent isolated word recognition using dynamic features of speech spectrum," IEEE Transations on Acoustics, Speech and Signal Processing,
vol. 34, no. 1, pp. 52-59, 1986.
[25] H. Hrmansky and P. Fousek, Multi-resolution RASTA filtering for TANDEMbased
ASR," International Speech Communication Association (INTER-
SPEECH), 2005.
[26] ETSI standard doc., Speech Processing, transmission and quality aspects (STQ);
distributed speech recognition; extended advanced front-end feature extraction
algorithm; compression algorithms; back-end speech reconstruction algorithm,"
ETSI ES 202 212 Ver.1.1.2, 2005.
[27] H. G. Hirsch and D. Pearce, The AURORA experimental framework for the
performance evaluation of speech recognition systems under noisy conditions,"
International Conference on Spoken Language Processing (ICSLP 2000), 2000.
[28] http://htk.eng.cam.ac.uk/
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top