跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.80) 您好!臺灣時間:2025/01/15 06:43
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:黃昭世
研究生(外文):Chao-Shih Huang
論文名稱:強健性語音辨識之錯誤率改善與估測
論文名稱(外文):Error Rate Reduction and Estimation for Robust Speech Recognition
指導教授:王小川王小川引用關係李錦輝李錦輝引用關係
指導教授(外文):Prof. Hsiao-Chuan WangProf. Chin-Hui Lee
學位類別:博士
校院名稱:國立清華大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:英文
論文頁數:97
中文關鍵詞:語音辨識強健性技術隨機匹配演算法錯誤率估測
外文關鍵詞:speech recognitionrobustness technologystochastic matchingerror rate estimation
相關次數:
  • 被引用被引用:0
  • 點閱點閱:357
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:3
本論文提出強健性語音辨識系統所需之兩種新技術;分別用於改善及估測其錯誤率。第一、我們提出一種稱為『訊噪比遞增式隨機匹配演算法』的強健性技術用來降低語音辨識系統在訓練及測試條件上的不匹配並進而提高辨識率,第二、我們由語音模型參數來估測語音辨識系統的錯誤率,無須作實驗就可隨時回報實際系統之操作性能作為改進之參考。
訊噪比遞增式隨機匹配演算法主要是用於降低添加性噪音的干擾,本演算法探討兩個存在於現有強健性技術的問題,第一、最大似度估測演算法在訓練及測試條件較不匹配時,會因較差的初始值導致其性能不佳;第二、在現有的模型補償技術上,其辨識率會受限於匹配測試環境的條件而無法達到乾淨語音的辨識率。本演算首先以一組較能匹配測試環境的語音模型作為初始值,然後沿著訊噪比遞增的方向同時進行語音特徵值及模型參數的調適,直到收斂為止。本演算基於較佳的初始值並指向高訊噪比方向調適,其理論上限為乾淨語音的性能。經實驗證實,在雜訊干擾下辨識率已達到匹配測試環境的上限時,本演算還能再向上提昇其辨識率,最高可達約30%的錯誤改善率。
為了能隨時監督語音辨識系統的性能且毋須額外的實驗評估,我們提出一套由隱藏式馬可夫模型參數來估測錯誤率的技術,這是在語音辨識的理論及實際應用上亟需之關鍵技術。然而,由於語音辨識是一種多選一的比較過程;且每次講同一個聲音的長度未必相同而需採用動態匹配演算的複雜性,所以很難有一個簡單的數學式來表示其錯誤率。為了克服這些困難,我們首先用一維的誤辨值來估算某一特定模型與其競爭者之機率距離;將此誤辨值經由一個平滑化的零-壹函數轉成該特定模型之錯誤率;然後將所有模型的錯誤率加權成為整個語音辨識系統的錯誤率。本方法的關鍵在於不用測試語音來估測每一個模型的誤辨值,我們將一步一步的介紹整個方法的演算過程;從簡單的高斯分布開始到高斯混合模型再到隱藏式馬可夫模型;其應用從語者辨識到單音乃至於連續語音之辨識。經實驗測試,無論是否有雜訊干擾,語音及語者辨識率的錯誤率都能隨時估算,這提供語音辨識系統在實際運作上所需的即時資訊。

In this thesis, we propose two new techniques of reducing and estimating error rate for robust speech recognition. A robustness technique, SNR-incremental stochastic matching (SISM) algorithm, is proposed to reduce the mismatch between the training and testing conditions by some form of compensation and consequently to improve the recognition performance. The goal of error rate estimation is for monitoring the recognition performance such that the appropriate schemes for improving the recognition accuracy can be performed according to the performance degradation.
The SISM algorithm is an extension of Sankar and Lee’s stochastic matching (SM) for dealing with the distortion due to additive noise. We address two issues concerning the original maximum likelihood-based SM techniques. One concern is that the initial condition of the expectation-maximization (EM) algorithm has to be set carefully if the mismatch between training and testing is large. The other is that the performance is often limited by the newly adapted model in noise compensation instead of reaching the higher level of accuracy often obtained in clean environments. Our proposed SISM algorithm attempts to improve the initial condition and to relax the performance bound. First, the SISM algorithm provides a good initial condition making use of a set of environment-matched models. The second is a recursive operation, i.e. the reference model in each recursion is changed along the direction of SNR increment in order to push the recognition performance to that obtained at higher SNR levels. Experimental results show that the SISM algorithm provides further improvement after the best environment-matched performance has been reached, and can therefore obtain an additional discriminative power through using the speech models with higher SNR instead of retraining process.
A model-based error rate estimation framework is proposed for speech and speaker recognition. It aims at predicting the performance of a hidden Markov model (HMM) based recognition system for a given task vocabulary and grammar without the need of running recognition experiments using a separate set of testing samples. This is highly desirable both in theory and in practice. However, the error rate expression in HMM-based speech recognition systems has no closed form solution, due to the complexity of the multi-class comparison process and the need for dynamic time warping to handle speech patterns of different sizes and of various lengths. To alleviate the difficulty, we propose a one-dimensional model-based misclassification measure to evaluate the distance between a particular model of interest and a combination of many of its competing models. The error rate for a class characterized by the HMM is then the value of a smooth zero-one error function given the misclassification measure. The overall error rate of the task vocabulary could then be computed as a function of all the available class error rates. The key here is to evaluate accurately the misclassification measure without using any speech data. In this paper, we show how the misclassification measure could be approximated by first computing the distance between two mixture Gaussian densities, then between two HMM’s with mixture Gaussian state observation densities and finally between two sequences of HMM’s. When comparing the error rate obtained in running actual experiments and that of the new framework without using any test data, the proposed algorithm accurately estimates the error rate of many types of speech and speaker recognition problems. Based on the same framework, it is also demonstrated that the error rate of a recognition system in a noisy environment could also be predicted.

摘 要 ii
誌 謝 iv
目 錄 v
第一章 導論 vi
第二章 基於隱藏式馬可夫模型之語音辨識 ix
第三章 訊噪比遞增式隨機匹配演算法 x
第四章 由語音模型參數估測語音辨識系統的錯誤率 xiii
第五章 結 論 xv
英文版博士論文 xvi
Abstract i
Acknowledgements iii
Contents iv
List of Figures vi
List of Tables vii
Chapter 1 Introduction 1
1.1 Automatic Speech Recognition 1
1.2 Thesis Goals 4
1.2.1 Robustness Technologies 4
1.2.2 Error Rate Estimation 8
1.2 Organizations of Dissertation 11
Chapter 2 Hidden Markov Model-based Speech Recognition 13
2.1 Introduction 13
2.2 Feature Extraction of Speech Signal 14
2.3 Hidden Markov Model 16
2.3.1 The Elements of HMM 16
2.3.2 The Assumptions of the HMM 18
2.3.3 Three Problems of the HMM 19
2.3.3 The Training algorithms for HMM 21
2.4 Experimental Corpora 23
Chapter 3 SNR-incremental Stochastic Matching for Noisy Speech Recognition 27
3.1 Survey of Previous Works 27
3.2 Stochastic Matching and Performance Issues 30
3.3 SNR-incremental Stochastic Matching (SISM) 33
3.4 Experimental Results 38
3.4.1 The Experiments for Performance Upper Bound Investigation 39
3.4.2 The Experiments for the SISM Performance Evaluation 45
3.5 Discussions 47
Chapter 4 Model-based Error Rate Estimation 50
4.1 Survey of Previous Works 50
4.2 Model-based Error Rate Estimation 52
4.2.1 The Definition of the Error Function 52
4.2.2 The Key Ideas of the Proposed Algorithm 54
4.2.3 The Key Steps to Approximate the Error Analysis 58
4.2.4 Performance Monitoring ─ Error Rate Estimation at Mismatched Conditions 63
4.3 Experimental Results 64
4.4 Discussions 68
Chapter 5 Conclusions 74
5.1 Summary of Results 74
5.2 Contributions 76
5.3 Suggestions for Future Works 77
Appendix 79
A. Feature-Space SM Algorithm 79
B. Log-Normal PMC Algorithm 82
C. Japanese Phone Table Used in this thesis 84
References 86
Publication List (Papers & Patents) 95
[1] A. Acero, “Acoustical and Environmental Robustness in Automatic Speech Recognition”, Boston, MA:Kluwer, 1992.
[2] M. Afify, Y. Gong, and J. -P Haton, “ A general joint additive and convolutive bias compensation approach applied to noisy lombard speech recognition, IEEE Trans. on Speech and Audio Processing, vol. 6, No. 6, pp. 524-537, 1998.
[3] B. Atal, “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification”. Journal of Acoustic Society of America, vol. 55, pp.1304-1312, 1974.
[4] L. R. Bahl, P. F. Brown, P. V. deSouza, and L. R. Mercer, “Maximum mutual information estimation of hidden Markov model parameters for speech recognition”, in Proc. ICASSP-86, pp. 49-52, 1986.
[5] L. E. Baum, “An inequality and associate maximization technique in statistical estimation for probabilistic functions of Markov processes”, Inequalities, vol. 2, pp.1-8, 1972.
[6] L. E. Baum, T. Petrie, G. Soules and N. Weiss, “ A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains”, Annals of Mathematical Statistics, vol. 41, pp.164-171, 1970.
[7] A. Bhattacharyya, “On a measure of divergence between two statistical populations defined by their probability distributions”, Bulletin of the Calcutta Mathematical Society, vol. 35, pp. 99-110, 1943.
[8] A. Biem, S. Katagiri, E. McDermott, and B. -H. Juang, “An application of discriminative feature extraction to filter-bank-based speech recognition”, IEEE Trans. on Speech and Audio Processing, vol. 9, No. 2, pp. 96-110, Feb. 2001.
[9] S. Boll, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Trans. On Acoustic, Speech, and Signal Processing, vol. 27, No. 2, pp.113-120, 1979.
[10] K. W. Bowyer and P. J. Phillips (Eds.), “Empirical evaluation techniques in computer vision”, IEEE Computer Society, Los Alamitos, CA, 1998.
[11] H. Chernoff, “A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations”, Annals of Mathematical Statistics, vol. 23, pp. 493-507,1952.
[12] W. Chou, C. -H. Lee and B. -H. Juang, “ Segmental GPD training of HMM based speed recognizer,” in Proc. ICASSP-92, pp. 473-476, 1992.
[13] W. Chou, C. -H. Lee and B. -H. Juang, “Minimum error rate training based on N-best string models”, in Proc. ICASSP-93, pp. II-652-II-655, 1993.
[14] W. Chou, C. -H. Lee and B. -H. Juang, “Minimum error rate training of inter-word context dependent acoustic model in speech recognition”, in Proc. ICSLP-94, pp. 439-442, 1994.
[15] D. Van Compernolle, “Noise adaptation in a hidden Markov model speech recognition system,” Comput. Speech Lang., vol. 3, pp. 151—167,1989.
[16] T. M. Cover and J. A. Thomas, “Elements of Information Theory”, McGraw-Hill, New York, 1967.
[17] T. M. Cover, “Learning in pattern recognition”, In S. Watanabe (Ed.), “Methodologies of pattern recognition”, pp. 111-132, Academic Press, New York, 1969.
[18] A. P. Davis and P. Mermelstien, “Comparison of parametric representation of monosyllabic word recognition in continuously spoken sentences”, IEEE Trans. Acoustic, Speech and Signal Processing, vol. 28, pp.357-366, 1980.
[19] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, J. R. Stat. Soc. B, vol. 39, pp. 1-38, 1977.
[20] R. O. Duda, P. E. Hart, and D. G. Stork, “Pattern Classification”, John Wiley & Sons, Inc., 2000.
[21] B. Efron, “The Jackknife, the bootstrap and other resampling plans”, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1982.
[22] B. Efron and R. J. Tibshirani, "Improvements on cross-validation: The .632+ bootstrap method," Journal of the American Statistical Association, vol. 92, pp. 548-560, 1997.
[23] Y. Ephraim, A. Dembo, and L. R. Rabiner, “A minimum discrimination information approach for hidden Markov modeling”, IEEE Trans. Information Theory, vol. 35, No. 5, pp. 1001-1013, September, 1989.
[24] Y. Ephraim, H. Lev-Ari, and R. M. Gray, “Asymptotic minimum discrimination information measure for asymptotically weakly stationary process”, IEEE Trans. Information Theory, vol. 34, No. 5, pp. 1033-1040, September, 1988.
[25] Y. E phraim, “Gain-adapted hidden Markov models for recognition of clean and noisy speech”, IEEE Trans. Signal Processing, vol.40, no. 6, pp. 1303-1316, 1992.
[26] A. Erell and M. Weintrub, “Energy-conditioned spectral estimation for recognition of noisy speech”, IEEE Trans. Speech and Audio Processing, vol.1, no. 1, pp. 84-89, 1994.
[27] G. D. Forney, “The Viterbi algorithm”, Proc. IEEE, vol. 61, pp268-278, March, 1973.
[28] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-29, pp. 254—272, 1981.
[29] S. Furui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum,” IEEE Trans. Acoust., Speech,Signal Processing, vol. ASSP-34, pp. 52—59, 1986.
[30] K. Fukunaga, “Introduction to statistical pattern recognition”, Academic Press, New York, 2nd edition, 1990.
[31] M. Gales and S. Young, “Cepstral parameter compensation for HMM recognition in noise”, Speech Communication, vol. 12, pp. 231-239, July 1993.
[32] M. Gales and S. Young, “Robust speech recognition in additive and convolutional noise using parallel model combination”, Computer Speech and Language, vol. 9, pp. 289-307, 1995.
[33] M. Gales and S. Young, “A fast and flexible implementation of parallel model combina-tion,” in Proc. ICASSP, 1995, pp. 131—136.
[34] J.-L. Gauvain and C.-H. Lee, “ Bayesian learning for hidden Markov models with Gaussian mixture state observation Densities,” Speech Communication, vol. 11, No. 2-3, pp. 205-214, 1992.
[35] J.-L. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Trans. Speech and Audio Processing, vol.2, pp. 291-298, 1994.
[36] Y. Gong, “Speech recognition in noisy environments: A survey”, Speech Communication, vol. 16, pp. 261-291, 1995.
[37] R. M. Gray, A. H. Gray, Jr., G. Rebolledo, and J. E. Shore, “Rate-distortion speech coding with a minimum discrimination information measure”, IEEE Trans. Information Theory, vol. 27, No. 6, pp. 708-721, Nov., 1981.
[38] I. Guyon, J. Makhoul, R. Schwartz, and V. Vapnik. “ What size test set gives good error rate estimates?”, IEEE Trans. Pattern Recognition and Machine Intelligent, vol. 20, No. 1, pp.52-64., 1998.
[39] J. Hansen. “Analysis and compensation of speech under stress and noise for en-vironmental robustness in speech recognition”. Speech Communication, vol., pp.20151—173, November 1996.
[40] H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech”, Journal of the Acoustical Society of America, vol. 87, pp. 1738-1752, 1990
[41] H. Hermansky and N. Morgan, “RASTA processing of speech,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 578—589, 1994.
[42] J. Hernando and C. Nadeu, “Speech recognition in noisy car environment based on OSALPC representation and robust similarity measuring techniques,” in Proc. ICASSP, 1994, pp. 69—72.
[43] J.S. U. Hjorth, Computer Intensive Statistical Methods Validation, Model Selection, and Bootstrap, London: Chapman & Hall, 1994.
[44] J. Holmes and N. Sedgwick, “Noise compensation for speech recognition using probabilistic models”, in Proc. ICASSP-86, 1986.
[45] C. -S. Huang and D. Langmann, “Performance evaluation of adapted and retrained models for noisy speech recognition”, in Proc. Int. Sym. Chinese Spoken Language Processing, 1998, pp. 216-219.
[46] C. -S. Huang and H. -C. Wang, “An SNR-incremental stochastic matching (SISM) algorithm for noisy speech recognition, in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, 1999, pp. 39-42.
[47] C. -S. Huang and H. -C. Wang, “A divergence-based model separation”, in Proc. Int. Sym. Chinese Spoken Language Processing, 2000, pp. 231-234.
[48] C. -S. Huang, H. -C. Wang, and C. -H. Lee, “An SNR-incremental stochastic matching algorithm for noisy speech recognition”, IEEE Trans. on Speech and Audio Processing, vol. 9, No. 8, pp. 866-873, Nov. 2001.
[49] C. -S. Huang and H. -C. Wang, “Bandwidth-adjusted LPC analysis for robust speech recognition”, accepted by Pattern Recognition Letters, Oct. 2002.
[50] J. -W. Hung, J. -L. Shen, and L. -S. Lee, “New approaches for domain transformation and parameter combination for improved accuracy in parallel model combination (PMC) techniques”, IEEE Trans. on Speech and Audio Processing, vol. 9, No. 8, pp. 842-855, Nov. 2001.
[51] “IEEE Standard 610.12-1990”, IEEE standard glossary of software engineering terminology, ISBN 1-55937-067-X, 1990.
[52] G. Kohn, “IEEE Standard Dictionary of Electrical and Electronics Terms”, IEEE, 6th ed., 1997.
[53] A. K. Jain, R. C. Dubes, and C. -C. Chen, “Bootstrap techniques for error estimation”, IEEE Trans. Pattern Recognition and Machine Intelligent, vol. 9, No. 5, pp.628-633, 1998.
[54] B. -H. Juang, “ Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains,” AT&T Technical Journal, vol. 64, 1985.
[55] B. -H. Juang and L. R. Rabiner, “A probabilistic distance measure for hidden Markov models”, AT&T Technical Journal, vol. 64, No. 2, pp. 391-408, Feb. 1985.
[56] B. -H. Juang, L. R. Rabiner, and J. G. Wilpon, “On the use of bandpass liftering in speech recognition”, IEEE Trans. Acoustic, Speech, Signal Processing, vol. 35, pp. 947-954, 1987.
[57] B. -H. Juang, “Speech recognition in adverse environments”, Computer Speech and Language, vol 5, pp. 275-294, 1991.
[58] B. -H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Trans. Signal Processing, vol.40, pp. 3043-3054, 1992.
[59] B. -H. Juang, W. Chou, and C. -H. Lee, “Minimum classification error rate methods for speech recognition”, IEEE Trans. on Speech and Audio Processing, vol. 5, No. 3, pp. 257-265, May 1997.
[60] Y. Kharin, “Robustness in Statistical Pattern Recognition”, Kluwer Academic Publishers, 1992.
[61] S. Kullback, “Information Theory and Statistics”, New York: Dover, 1968.
[62] C. -H. Lee, “On stochastic feature and model compensation approaches to robust speech recognition”, Speech Communication, vol. 25, pp. 29-47, August 1998.
[63] K. F. Lee, Large-vocabulary Speaker-independent Continuous Speech Recognition: The SPHINX System, Ph. D. Thesis, Carnegie-Mellon University, 1988.
[64] C. J. Legetter and P. C. Woodland, “ Flexible Speaker Adaptation using Maximum Likelihood Linear Regression”, ARPA Workshop on Spoken Language System Technology, pp. 110-115, 1995.
[65] S. E. Levinson, L. R. Rabiner, and M. M. Sandhi, “ An introduction to the application of the theory of probabilitstic functions of a Markov process to automatic speech recognition”, The Bell System Technical Journal, vol. 62, pp.1035-1074, 1983.
[66] J. Lim and A. Oppenhiem, “All-pole modeling of degraded speech”, IEEE Trans. On Acoustic, Speech, and Signal Processing, vol. 26, No. 3, pp.197-210, 1978.
[67] L. R. Liporace, “ Maximum likelihood estimation for multivariate observations of Markov sources, “ IEEE Trans. Information Theory, vol. 28, pp. 729-734, 1982.
[68] C. -S. Liu, et al, “A study on minimum error discriminative training for speaker recognition”, J. Acoust. Soc. Amer., vol 97, no. 1, pp637-648, 1995.
[69] P. Lockwood and P. Alexandre, “Root adaptive homomorphic deconvolution schemes for speech recognition in noise”, In Proc. ICASSP-94, vol. 1, pp. 441-444, 1994.
[70] D. Mansour and B. H. Juang, “The short-time modified coherence representation and noisy speech recognition,” IEEE Trans. Signal Processing, vol. 37, pp. 795—804, June 1989.
[71] D. Mansour and B. -H. Juang, “A family of distortion measures based upon projection operation for robust speech recognition”, IEEE Trans. on Acoustic, Speech, and Signal Processing, vol. ASSP-37, No. 11, pp. 1659-1671, 1989.
[72] P. J. Moreno, “Speech recognition in noisy environments”, Ph.D. Thesis, Carnegie Mellon Univ., May 1996.
[73] P. J. Moreno, B. Raj, and R. M. Stern, “Data-driven environmental compensation for speech recognition: A unified approach,” Speech Commun., vol. 24, pp. 267—285, 1998.
[74] A. Nadas, D. Nahamoo, and M. Picheny, “Speech recognition using noise-adaptive prototype”, in Proc. ICASSP-88, pp. 517-520, 1988.
[75] “NIST speech quality assurance (SPQA) package”, version 2.4, National Institute of Standards and Technology (NIST), Nov. 1994.
[76] J. A. Nolazco-Flores and S. J. Young, “Adapting a HMM based recognizer for noisy speech enhanced by spectral subtraction”, In Proc. Eurospeech’93, pp. 829-832, 1993.
[77] J. A. Nolazco-Flores and S, J. Young, “Continuous speech recognition in noise using spectral subtraction and HMM adaptation”, In Proc. ICASSP-94, vol. 1, pp. 409-412, 1994.
[78] L. R. Rabiner, B. -H. Juang, S. E. Levinsion, and M. M. Sondhi, “Recognition of isolated digits using hidden Markov models with continuous mixture density”, AT&T Technical Journal, vol. 64. pp1211-1234, 1985.
[79] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition”, Proc. IEEE, vol. 77, pp. 257-286, Feb. 1989.
[80] L. R. Rabiner and B. -H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
[81] L. R. Rabiner, B. -H. Juang, and C. -H. Lee, “An overview of automatic speech recognition”. In C. -H Lee, F. K. Song and K. K. Paliwal (Eds.), “Automatic Speech and Speaker Recognition: Advanced Topics”, Chapter 1, Kluwer Academic Publishers, 1996.
[82] M. Rahim and C. -H. Lee, "Simultaneous feature and HMM design using string-based minimum classification error training criterion," Proc. ICSLP-96, pp. 1820-1823, Philadelphia, Oct. 1996.
[83] D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Trans. Speech and Audio Processing, vol.3, pp. 72-83, 1995.
[84] R. Rose, E. M. Hofstetter, and D. A. Reynolds, “Integrated models of speech and background with applications to speaker identification in noise”, IEEE Trans. on Speech and Audio Processing, vol. 2, pp. 245-257, 1994.
[85] A. E. Rosenberg, J. Delong, C. -H. Lee, B. -H. Juang, and F. K. Soong, “The use of cohort normalized scores for speaker recognition”, Proc. ICSLP-92, Banff, pp.599-602, Oct. 1992.
[86] A. Sankar and C. -H. Lee, “A maximum likelihood approach to stochastic matching for robust speech recognition”, IEEE Trans. on Speech and Audio Processing, vol. 4, No. 3, pp. 190-202, May 1996.
[87] O. Siohan, and C. -H. Lee, “Iterative noise and channel estimation under the stochastic matching algorithm framework”, IEEE Signal Processing Letter, vol. 4, No. 11, pp. 304-306, November 1997.
[88] F. K. Soong and A. E. Rosenberg, “On the use of instantaneous and transitional spectral information in speaker recognition,” IEEE Trans. Acoust. Speech Signal Process, vol. 36, 1988.
[89] R. M. Stern, A. Acero, F. -H Liu, and Y. Ohshima, “Signal processing for robust speech recognition”. In C. -H Lee, F. K. Song and K. K. Paliwal (Eds.), “Automatic Speech and Speaker Recognition: Advanced Topics”, Chapter 15, Kluwer Academic Publishers, 1996.
[90] W. -H. Tsai, “Automatic identification and indexing of Chinese multilingual spoken language”, Ph. D. Dissertation, Institute of Communication Engineering, National Chiao Tung University, Hsinchu, Taiwan, May, 2001.
[91] A. Varga, R. Moore, J. Bridle, K. Ponting, and M. Russell, “Noise compensation algorithms for use with hidden Markov model based speech recognition”, in Proc. ICASSP-88, pp. 481-484, 1988.
[92] A. Varga and R. Moore, “Hidden Markov model decomposition for speech and noise”, in Proc. ICASSP-90, pp. 845-848, 1990.
[93] A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems”, Speech Communication, vol. 12, pp. 247-251, 1993.
[94] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically optimal decoding algorithm”, IEEE Trans. Information Theory, IT-13, pp260-269, April 1967.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top