(18.204.227.34) 您好!臺灣時間:2021/05/19 08:49
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:范顥騰
研究生(外文):Hao-Teng Fan
論文名稱:各域之語音特徵分頻帶處理於強健性語音辨識之研究
論文名稱(外文):Sub-band Processing in Various Domains of Speech Features for Robust Speech Recognition
指導教授:洪志偉洪志偉引用關係
指導教授(外文):Jeih-Weih Hung
口試委員:陳柏琳李士修曹昱林容杉洪志偉
口試委員(外文):Berlin ChenEric S. LiYu TsaoJung-Shan LinJeih-Weih Hung
口試日期:2014-04-09
學位類別:博士
校院名稱:國立暨南國際大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2014
畢業學年度:102
語文別:英文
論文頁數:113
中文關鍵詞:小波轉換強健性語音辨識時間序列域空間域調變頻譜
外文關鍵詞:discrete wavelet transformrobust speech recognitiontemporal processingspatial processingmodulation spectrum
相關次數:
  • 被引用被引用:0
  • 點閱點閱:164
  • 評分評分:
  • 下載下載:19
  • 收藏至我的研究室書目清單書目收藏:0
自動語音辨識系統容易受到雜訊干擾的影響,導致辨識精確率下降。為了有效改善
此情形,一系列語音特徵強健化技術陸續被提出,藉以提升雜訊環境下的辨識效能。
在本論文中,主要探討不同頻率範圍對於語音辨識的重要性,改良傳統全頻帶處理
方式,進而發展出各域之分頻式處理的強健性方法,分述如下:
時間序列域:小波消噪法(Wavelet-denoising)與基於小波轉換技術之分頻統
計正規化法(Sub-band feature statistics normalization)
調變頻譜域:分頻帶調變頻譜正規化法(Sub-band modulation spectrum
normalization) 與調變頻譜冪次展開法(Modulation spectrum power-law
expansion)
空 間 域 : 加 權 式 子 頻 帶 統 計 圖 正 規 化 (Weighted sub-band histogram
equalization)
我們選用 Aurora-2 連續數字與Aurora-4 大字彙語料庫來評量新方法辨識效能,從
實驗結果顯示,上述分頻帶處理之新方法都可有效提高在雜訊干擾因素下的辨識精確
率,且優於傳統全頻帶處理方式,顯示提出的新語音特徵參數具備更佳的雜訊強健性。
The environmental mismatch caused by additive noise and/or channel distortion often
dramatically degrades the performance of an automatic speech recognition system (ASR).
In order to reduce this mismatch, a plenty of robustness techniques have been developed.This dissertation proposes several novel methods via using sub-band process in different domains of speech features to improve noise robustness for speech recognition.
Briefly speaking, in this dissertation we investigate the noise effect in three domains of speech features and then develop the respective counter measures. Firstly, we present the methods of wavelet threshold de-noising and sub-band feature statistics normalization that are applied in temporal domain. Second, two modulation-domain algorithms, sub-band modulation spectrum normalization and modulation spectrum power-law expansion, are developed and evaluated. Finally, we provide a novel scheme that processes high- and lowpass portions of the spatial-domain features, and this scheme is called weighted sub-band histogram equalization.
The presented novel methods are examined in two databases, Aurora-2 and Aurora-4. The
corresponding experiment results show these sub-band methods behave better than the
respective full-band methods in most cases, and they benefit the speech recognition process significantly by improving the recognition accuracy under a wide range of noise environments.
Contents
誌謝 i
中文摘要 ii
Abstract iii
Contents v
List of Figures viii
List of Tables xi
1. Introduction 1
Background......................................................................1
1.1 Some well-known noise-robust techniques in speech recognition ..............3
1.2 Framework of this dissertation .............................................4
2. Temporal-domain sub-band processing 7
2.1 Wavelet de-noising in temporal sequences of speech feature..................7
2.1.1 Introduction..............................................................7
2.1.2 Basic formulation of discrete wavelet transform...........................9
2.1.3 Wavelet thresholding de-noising..........................................11
2.1.4 Wavelet threshold denoising for normalized speech features...............14
2.1.5 Experimental results and discussions.....................................21
2.1.6 The effect of different parameter settings of WTD processing in recognition
performance..............................29
2.2 Sub-band feature statistics normalization techniques based on discrete wavelet transform..............................................................32
2.2.1 Introduction.............................................................32
2.2.2 Sub-band feature statistics normalization method.........................33
2.2.3 Experimental results.....................................................35
2.3 Summary....................................................................43
3. Modulation-domain sub-band processing 44
3.1 Sub-band modulation spectrum normalization.................................44
3.1.1 Introduction.............................................................44
3.1.2 The proposed methods.....................................................45
3.1.3 Experimental results.....................................................49
3.2 Modulation spectrum power-law expansion....................................58
3.2.1 Background...............................................................58
3.2.2 Proposed approach........................................................59
3.2.3 Experimental results.....................................................62
3.3 Summary....................................................................72
4. Spatial-domain sub-band processing 73
4.1 Intra-frame cepstral sub-band weighting and histogram equalization.........73
4.1.1 Background...............................................................73
4.1.2 Overview of S-HEQ........................................................74
4.1.3 Proposed approach: WS-HEQ................................................75
4.1.4 Experimental results.....................................................81
4.2 The combinations of WS-HEQ and presented methods in chapters 2 and 3.......99
4.2.1 Integrated with temporal-domain sub-band methods.........................99
4.2.2 Integrated with modulation-domain sub-band methods......................102
4.3 Summary...................................................................104
5. Conclusions and future works 105
5.1 Temporal-domain sub-band process..........................................105
5.2 Modulation-domain sub-band process........................................105
5.3 Spatial-domain sub-band process...........................................106
References 108

List of Figures
1.1 Overall ASR system (adopted from [1]).......................................3
1.2 Various domains in MFCC feature sequence....................................6
2.1 a one-level discrete wavelet transform (DWT)...............................10
2.2 a three-level discrete wavelet transform (DWT).............................10
2.3 a one-level inverse discrete wavelet transform (IDWT)......................10
2.4 a three-level inverse discrete wavelet transform (IDWT)....................11
2.5 the hard and soft thresholding functions in Eqs. (2.6) and (2.7) given the threshold theta=1..............................................................13
2.6 the averaged PSD curves of different gender of feature streams at three SNRs, clean, 10 dB and 0 dB, corresponding to the MFCC c1 features and processed by MVN, CGN and CHN, respectively..............................................17
2.7 the PSD curves of feature streams for the utterances from the male and female speakers at three SNRs, clean, 10 dB and 0 dB, corresponding to the MFCCs (a) processed by MVN and then WTD (b) processed by CGN and then WTD (c) processed by CHN and then WTD. The left panel is for male speakers, and right panel is for female speakers...................................................21
2.8 the averaged accuracy rates achieved by WTD on different sub-bands of (a) MVN
features (b) CGN features and (c) CHN features as to Aurora-2 clean condition
training.......................................................................30
2.9 the averaged accuracy rates achieved by WTD with different thresholding strategies (soft and hard), performed on (a) MVN features (b) CGN features and (c) CHN features as to Aurora-2 clean condition training.......................................................................30
2.10 the averaged accuracy rates achieved by WTD with different threshold selecting rules (“heursure”, “rigrsure” and “sqtwolog”), performed on (a) MVN features (b) CGN features and (c) CHN features as to Aurora-2 clean condition training.............................................................31
2.11 the averaged accuracy rates achieved by WTD using different noise variance
assignments (“mln”, “sln” and “one”), performed on (a) MVN features (b) CGN
features and (c) CHN features as to Aurora-2 clean condition training..........31
2.12 The procedures of the proposed sub-band feature statistics normalization approaches, where the discrete wavelet transform (DWT) is used for the octave-band filter-bank analysis and synthesis.................................34
2.13 The averaged normalized c1 PSD curves after various normalization methods for three
SNR levels, clean, 10dB and 0dB................................................42
3.1 The normalized c1 PSD curves of the 1001 utterances (chosen from the subway noise of Test set A in the Aurora-2 database) after various normalization methods for three SNR levels, clean, 10 dB and 0 dB............................57
3.2 The MVN-processed MFCC c1 PSD curves at three SNR cases: 20 dB, 10 dB and 0
dB.............................................................................61
3.3 The overall recognition accuracy achieved by four MSPLE methods with different power factor as to Aurora-2 clean condition training.................68
3.4 The MFCC c1 PSD curves of the utterances with respect to different-gender speakers processed by various compensation methods.............................71
4.1 The structure of S-HEQ.....................................................75
4.2 Some information about the DFT-based spectrum of cepstra without CHN processing. (a) Recognition rates for the band-pass filtered cepstra. (b) The contribution of each individual spectral point.................................78
4.3 Some information about the DFT-based spectrum of CHN-processed cepstra. (a)
Recognition rates for the band-pass filtered cepstra. (b) The contribution of each individual spectral point.................................................78
4.4 The flowcharts of two structures of WS-HEQ: (a) Structure I and (b) structure II...................................................................81
4.5 The overall word error rate (%) averaged over all noise types and levels achieved by different noise-robustness methods under the Aurora-2 clean condition training mode........................................................84
4.6 The overall recognition accuracy achieved by two WS-HEQ methods with different α under the clean-condition training mode of Aurora-2. (a) WS-HEQ^(1)_I (b)WS-HEQ^(1)_II.................................................93
4.7 Feature distortion averaged over the 1,001 utterances of Test Set A, achieved by SHEQ,WS-HEQ^(1)_I(α=0.6), and WS-HEQ^(1)_II (α=0.6). The DFT size used in Eq. (4.9) is set to 512..............................................95
4.8 Flowcharts of three schemes: (a) Scheme 1. (b) Scheme 2. (c) Scheme 3......96
4.9 The averaged recognition accuracy (%) for WS-HEQ combined with temporal-domain sub-band methods under the clean-condition training mode of Aurora-2...101
4.10 The averaged recognition accuracy (%) for WS-HEQ combined with modulation domain sub-band methods under the clean-condition training mode of Aurora-2...103

List of Tables
1.1 The possible variations faced by an ASR system [2]..........................2
2.1 The information of Aurora-2 database.......................................24
2.2 The baseline experimental setup of Aurora-4 database.......................24
2.3 The recognition accuracy rates (%) achieved by different methods under the cleancondition training mode of Aurora-2 database. RR (%) stands for the relative error rate reduction..................................................26
2.4 The recognition accuracy (%) achieved by various methods for different SNR values averaged over all noise types under the clean-condition training mode of Aurora-2 database..............................................................27
2.5 The recognition accuracy rates (%) achieved by different methods under the multi-condition training mode of Aurora-2 database.............................27
2.6 The recognition accuracy rates (%) achieved by different methods with the Aurora-4 database evaluation...................................................28
2.7 Recognition accuracy (%) achieved by various methods for the Aurora-2 clean
condition training task averaged across the SNRs between 0 and 20dB. RR1 (%) and
RR2 (%) are the relative error rate reductions over the baseline and the full-band methods, respectively................................................39
2.8 Recognition accuracy (%) achieved by various methods for different SNR values
averaged over all noise types in Test Sets A, B and C of Aurora-2 clean condition
training task..................................................................40
2.9 Recognition accuracy (%) achieved by various methods for the Aurora-2 multi-condition training task averaged across the SNRs between 0 and 20 dB...........40
2.10 Recognition accuracy (%) achieved by various methods for the Aurora-4 clean
condition training task........................................................41
3.1 Recognition accuracy (%) achieved by various methods processed on the original MFCC features, averaged across the SNRs between 0 dB and 20 dB of the Aurora-2 clean condition training task. RR1 (%) and RR2 (%) are the relative error rate reductions over the baseline and the corresponding full-band method, respectively...................................................................52
3.2 Recognition accuracy (%) achieved by various methods for different SNR values
averaged over all noise types in Test Sets A, B and C of Aurora-2 clean condition
training task..................................................................52
3.3 Recognition accuracy (%) achieved by various methods processed on the original MFCC features, averaged across the SNRs between 0 dB and 20 dB of the Aurora-2 multi-condition training task........................................53
3.4 Recognition accuracy (%) achieved by various methods processed on the original MFCC features as to the Aurora-4 large vocabulary task................53
3.5 Recognition accuracy (%) achieved by various methods processed on the T_CMS-,
T_MVN- and T_CHN-normalized MFCC features, averaged across the SNRs between
0 dB and 20 dB of the Aurora-2 clean-condition training task. RR1 (%) and RR2 (%)
are the relative error rate reductions over the T_MVN and T_ CHN baseline and the
corresponding full-band method, respectively...................................55
3.6 The optimal exponent α in MSPLE with respect to different pre-processing methods derived from the recognition results of the development set............63
3.7 Recognition accuracy (%) achieved by various methods for the Aurora-2 clean
condition training task averaged across the SNRs between 0 dB and 20 dB. RR (%) is the relative error rate reduction over the MFCC baseline....................65
3.8 The recognition accuracy results (%) of various methods for different SNR cases while averaged over ten noise situations as to the Aurora-2 clean condition training task........................................................65
3.9 Recognition accuracy (%) achieved by various methods for the Aurora-2 multi-condition training task averaged across the SNRs between 0 dB and 20 dB........66
3.10 The recognition accuracy results (%) of various methods for the Aurora-4 clean condition training task..................................................66
3.11 Recognition accuracy (%) achieved by various preprocessing methods and the lowband MSPLE with the various bandwidth ratio r defined in Eq. (3-13) based on Aurora-2 clean-condition training mode.........................................69
4.1 The scaling factor α for each type of WS-HEQ that obtains the optimal recognition accuracy in the development set as to the Aurora-2.................82
4.2 The recognition accuracy results (%) of the MFCC baseline, CHN, S-HEQ and WS-HEQ with structure I, which are for different Test Sets while averaged over 5 SNR conditions (20 dB to 0 dB) as to the Aurora-2 clean condition training. RR (%) is the relative error rate reduction compared with the MFCC baseline.......85
4.3 The recognition accuracy results (%) of the MFCC baseline, CHN and WS-HEQ with structure II, which are for different Test Sets while averaged over 5 SNR conditions (20 dB to 0 dB) as to the Aurora-2 clean condition training. RR (%) is the relative error rate reduction compared with the MFCC baseline...........86
4.4 The recognition accuracy results (%) of the MFCC baseline, CHN and eight forms of WS-HEQ, which are for different SNR cases while averaged over 10 noise situations as to the clean-condition training mode of Aurora-2 database........87
4.5 The recognition accuracy results (%) of the MFCC baseline, CHN and eight forms of WS-HEQ based on the Aurora-2 multi-condition training task............88
4.6 The recognition accuracy results (%) of the MFCC baseline, CHN and eight forms of WS-HEQ based on the Aurora-4 clean-condition training task............89
4.7 The recognition accuracy results (%) achieved by the combination of MVA and WSHEQ, which are for different Test Sets while averaged over 5 SNR conditions (20 dB to 0 dB) as to the Aurora-2 clean condition training task. The scaling factor α listed in Table 4.1 is adopted for each WS-HEQ.......................91
4.8 The recognition accuracy results (%) achieved by the combination of TSN and WSHEQ,
which are for different Test Sets while averaged over 5 SNR conditions (20 dB
to 0 dB) as to the Aurora-2 clean condition training task. The scaling factor α listed in
Table 4.1 is adopted for each WS-HEQ...........................................92
4.9 The recognition accuracy results (%) of the three schemes defined in sub-section 4.1.4.D, which are for the clean condition, the average of five SNR conditions (20 dB, 15 dB, 10 dB, 5 dB and 0 dB) and the –5dB SNR condition but averaged over the ten noise types as to the Aurora-2 database..................98
References
[1] L. R. Rabiner and R. W. Schafer, “Theory and applications of digital speech processing,”
1st edition, Prentice Hall, 2010.
[2] The teaching materials of “Spoken Language Processing,” from Prof. Berlin Chen,
http://berlin.csie.ntnu.edu.tw/.
[3] B. S. Atal, “The history of linear prediction,” IEEE Signal Processing Magazine, 23(2),
pp. 154-161, 2006.
[4] S. B. Davis and P. Mermelstein, “Comparison of parametric representations for
monosyllabic word recognition in continuous spoken sentences,” IEEE Transaction on
Acoustics, Speech and Signal Processing, 28(4), pp. 357-366, 1989.
[5] H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” Journal of the
Acoustical Society of America, 87(4), pp. 1738-1752, 1990.
[6] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE
Transactions on Acoustics, Speech and Signal Processing, 29(2), pp. 254-272, 1981.
[7] S. Tibrewala, H. Hermansky, “Multiband and adaptation approaches to robust speech
recognition,” in Proceedings of the Eurospeech Conference on Speech Communications
and Technology, pp. 2619-2622, 1997.
[8] F. Hilger and H. Ney, “Quantile based histogram equalization for noise robust large
vocabulary speech recognition,” IEEE Transaction on Audio, Speech, and Language
Processing, 14(3), pp. 845-854, 2006.
[9] C.-W. Hsu and L.-S. Lee, “Higher order cepstral moment normalization for improved
robust speech recognition,” IEEE Transactions on Audio, Speech, and Language
Processing, 17(2), pp. 205-220, 2009.
[10] J. Du and R.-H. Wang, “Cepstral shape normalization (CSN) for robust speech
recognition,” in Proceedings of the IEEE International Conference on Acoustics,
Speech and Signal Processing, pp. 4389-4392, 2008.
[11] J.-W. Hung, J.-L. Shen and L.-S. Lee, “New approaches for domain transformation and
parameter combination for improved accuracy in parallel model combination (PMC)
techniques,” IEEE Transactions on Speech and Audio Processing, 9(8), pp. 842-855,
2001.
[12] J. H. Holmes and N. C. Sedgwick, “Noise compensation for speech recognition using
probabilistic models,” in Proceedings of the IEEE International Conference on
Acoustics, Speech and Signal Processing, pp. 741-744, 1986.
[13] A. Acero, L. Deng, T. Jristjansson, and J. Zhang, “HMM adaptation using vector taylor
series for noisy speech recognition,” in Proceedings of the International Conference on
Spoken Language Processing, pp. 869-872, 2000.
[14] J.-L. Gauiain and C.-H. Lee, “Maximun a posteriori estimation for multivariate
gaussian mixture observations of markov chains,” IEEE Transactions on Speech and
Audio Processing, 2(2), pp. 291-298, 1994.
[15] C. J. Leggester and P. C. Woodland, “Maximum likelihood linear regression for speaker
adaptation of continous density HMMs,” Computer Speech and Language, 9(2), pp.
171-185, 1995.
[16] M. J. F. Gales and S. J. Young, “Cepstral parameter compensation for HMM recognition
in noise,” Speech Communication, 12(3), pp. 231-239. 1993.
[17] L. Bahl, P. Brown, P. de Souza and R. Mercer, “Maximum mutual information
estimation of hidden Markov model parameters for speech recognition,” in Proceedings
of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp.
49-52, 1986.
[18] S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE
Transactions on Acoustics, Speech and Signal Processing, 27(2), pp. 113-120, 1979.
[19] M. Berouti, R. Schwartz, J. Makhoul, “Enhancement of speech corrupted by acoustic
noise,” in Proceedings of the IEEE International Conference on Acoustics, Speech and
Signal Processing, pp. 208-211, 1979.
[20] S. Kamath and P. Loizou, “A multi-band spectral subtraction method for enhancing
speech corrupted by colored noise,” in Proceedings of the IEEE International
Conference on Acoustics, Speech and Signal Processing, pp. IV-4164, 2002.
[21] Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error
short-time spectral amplitude estimator,” IEEE Transactions on Audio, Speech, and
Language Processing, 32(6), pp. 1109-1121, 1984.
[22] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error
log-spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech and Signal
Processing, 33(2), pp. 443-445, 1985.
[23] C. Plapous, C. Marro, and P. Scalart, “Improved signal-to-noise ratio estimation for
speech enhancement,” IEEE Transactions on Audio, Speech and Language Processing,
14(6), pp. 2098-2108, 2006.
[24] P. Scalart and J. V. Filho, “Speech enhancement based on a priori signal to noise
estimation,” in Proceedings of the IEEE International Conference on Acoustics, Speech
and Signal Processing, pp. 629-632, 1996.
[25] V. Grancharov and J. S. B. Kleijn, “On causal algorithms for speech enhancement,”
IEEE Transactions on Audio, Speech, and Language Processing, 14(3), pp. 764-773,
2006.
[26] K. Paliwal, K. Wójcicki and B. Schwerin, “Single-channel speech enhancement using
spectral subtraction in the short-time modulation domain,” Speech Communication, 52(5), pp. 450-475, 2010.
[27] K. Paliwal, B. Schwerin and K. Wójcicki, “Speech enhancement using minimum meansquare
error short-time spectral modulation magnitude estimator,” Speech
Communication, 54(2), pp. 282-305, 2012.
[28] Y. Shingo, H. Noboru, W. Naoya and M. Yoshikazu, “Cepstral gain normalization for
noise robust speech recognition,” in Proceedings of the IEEE International Conference
on Acoustics, Speech and Signal Processing, pp. 209-212, 2004.
[29] J. C. Goswami and A. K. Chan, “Fundamentals of wavelets: Theory, algorithms, and
applications,” Wiley, 2nd edition, 2010.
[30] S. Mallat and W. L. Hwang, “Singularity detection and processing with wavelets,”
IEEE Transactions on Information Theory, 38(2), pp. 617-643, 1992.
[31] D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptation via wavelet shrinkage,”
Biometrika, 81(3), pp. 425-455, 1994.
[32] D. L. Donoho and I. M. Johnstone, “Adapting to unknown smoothness via wavelet
shrinkage,” Journal of the American Statistical Association, 90(432), pp. 1200-1224,
1995.
[33] S. G. Chang, B. Yu and M. Vetterli, “Adaptive wavelet thresholding for image denoising
and compression,” IEEE Transactions Image Processing, 9(9), pp. 1532-1546, 2000.
[34] N. Kanedera, T. Arai, H. Hermansky, and M. Pavel, “On the importance of various
modulation frequencies for speech recognition,” in Proceedings of the European
Conference on Speech Communication and Technology, pp. 1079-1082, 1997.
[35] C. M. Stein, “Estimation of the mean of a multivariate normal distribution,” The Annals
of Statistics, 9(6), pp. 1135-1151, 1981.
[36] http://www.mathworks.com/help/toolbox/wavelet/ref/wden.html
[37] C.-P. Chen and J. A. Bilmes, “MVA processing of speech features,” IEEE Transactions on Audio, Speech, and Language Processing, 15(1), pp. 257-270, 2006.
[38] H. G. Hirsch and D. Pearce, “The AURORA experimental framework for the
performance evaluation of speech recognition systems under noisy conditions,” in
Proceedings of the 2000 Automatic Speech Recognition: Challenges for the new
Millenium, pp. 181-188, 2000.
[39] http://htk.eng.cam.ac.uk
[40] N. Parihar and J. Picone, “Aurora Working Group: DSR front end LVCSR evaluation
AU/384/02,” Institute for Signal and Information Processing, Mississippi State
University, Mississippi, Mississippi State, 2002.
[41] The CMU Pronouncing Dictionary, http://www.speech.cs.cmu.edu/cgi-bin/cmudict,
Speech at Carnegie Mellon University, Carnegie Mellon University, Pittsburgh,
Pennsylvania, 2001.
[42] X. Xiao, E. S. Chng and H. Li, “Normalization of the speech modulation spectra for
robust speech recognition,” IEEE Transactions on Audio, Speech, and Language
Processing, 16(8), pp. 1662-1674, 2008.
[43] L.-C. Sun and L.-S. Lee, “Modulation spectrum equalization for improved robust
speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing,
20(3), pp. 828-843, 2012.
[44] J.-W. Hung and W.-Y. Tsai, “Constructing modulation frequency domain based features
for robust speech recognition,” IEEE Transactions Audio, Speech, and Language
Processing, 16(3), pp. 563-577, 2008.
[45] H. Hermansky and N. Morgan, “RASTA processing of speech,” IEEE Transactions on
Speech and Audio Processing, 2(4), pp.578-589, 1994,
[46] J.-W. Hung and L.-S. Lee, “Optimization of temporal filters for constructing robust
features in speech recognition,” IEEE Transactions Audio, Speech, and Language Processing, 14(3), pp.808-832, 2006.
[47] J.-W. Hung, W.-H. Tu and C.-C. Lai, “Improved modulation spectrum enhancement
methods for robust speech recognition,” Signal Processing, 92(11), pp. 2791-2814,
2012.
[48] H.-T. Fan, Y.-C. Lian and J.-W. Hung, “Modulation spectrum exponential weighting for
robust speech recognition,” in Proceedings of the International Conference on ITS
Telecommunications, pp. 812-816, 2012.
[49] V. Joshi, R. Bilgi, S. Umesh, L. García, M. C. Benítez, “Sub-band level histogram
equalization for robust speech recognition,” in Proceedings of International Conference
on Spoken Language Processing, pp. 1661-1664, 2011.
[50] J. Benesty, M. M. Sondhi and Y. Huang (Eds.), “Springer Handbook of Speech
Processing,” Springer, 2008.
[51] D. Macho, L. Mauuary, B. Noé, Y. M. Cheng, D. Ealey, D. Jouvet, H. Kelleher, D.
Pearce, and F. Saadoun, “Evaluation of a noise-robust DSR front-end on Aurora
databases,” in Proceedings of the International Conference on Spoken Language
Processing, pp. 17-20, 2002.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top