|
[1] A. Acero, “Acoustical and Environmental Robustness in Automatic Speech Recognition”, Boston, MA:Kluwer, 1992. [2] M. Afify, Y. Gong, and J. -P Haton, “ A general joint additive and convolutive bias compensation approach applied to noisy lombard speech recognition, IEEE Trans. on Speech and Audio Processing, vol. 6, No. 6, pp. 524-537, 1998. [3] B. Atal, “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification”. Journal of Acoustic Society of America, vol. 55, pp.1304-1312, 1974. [4] L. R. Bahl, P. F. Brown, P. V. deSouza, and L. R. Mercer, “Maximum mutual information estimation of hidden Markov model parameters for speech recognition”, in Proc. ICASSP-86, pp. 49-52, 1986. [5] L. E. Baum, “An inequality and associate maximization technique in statistical estimation for probabilistic functions of Markov processes”, Inequalities, vol. 2, pp.1-8, 1972. [6] L. E. Baum, T. Petrie, G. Soules and N. Weiss, “ A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains”, Annals of Mathematical Statistics, vol. 41, pp.164-171, 1970. [7] A. Bhattacharyya, “On a measure of divergence between two statistical populations defined by their probability distributions”, Bulletin of the Calcutta Mathematical Society, vol. 35, pp. 99-110, 1943. [8] A. Biem, S. Katagiri, E. McDermott, and B. -H. Juang, “An application of discriminative feature extraction to filter-bank-based speech recognition”, IEEE Trans. on Speech and Audio Processing, vol. 9, No. 2, pp. 96-110, Feb. 2001. [9] S. Boll, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Trans. On Acoustic, Speech, and Signal Processing, vol. 27, No. 2, pp.113-120, 1979. [10] K. W. Bowyer and P. J. Phillips (Eds.), “Empirical evaluation techniques in computer vision”, IEEE Computer Society, Los Alamitos, CA, 1998. [11] H. Chernoff, “A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations”, Annals of Mathematical Statistics, vol. 23, pp. 493-507,1952. [12] W. Chou, C. -H. Lee and B. -H. Juang, “ Segmental GPD training of HMM based speed recognizer,” in Proc. ICASSP-92, pp. 473-476, 1992. [13] W. Chou, C. -H. Lee and B. -H. Juang, “Minimum error rate training based on N-best string models”, in Proc. ICASSP-93, pp. II-652-II-655, 1993. [14] W. Chou, C. -H. Lee and B. -H. Juang, “Minimum error rate training of inter-word context dependent acoustic model in speech recognition”, in Proc. ICSLP-94, pp. 439-442, 1994. [15] D. Van Compernolle, “Noise adaptation in a hidden Markov model speech recognition system,” Comput. Speech Lang., vol. 3, pp. 151—167,1989. [16] T. M. Cover and J. A. Thomas, “Elements of Information Theory”, McGraw-Hill, New York, 1967. [17] T. M. Cover, “Learning in pattern recognition”, In S. Watanabe (Ed.), “Methodologies of pattern recognition”, pp. 111-132, Academic Press, New York, 1969. [18] A. P. Davis and P. Mermelstien, “Comparison of parametric representation of monosyllabic word recognition in continuously spoken sentences”, IEEE Trans. Acoustic, Speech and Signal Processing, vol. 28, pp.357-366, 1980. [19] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, J. R. Stat. Soc. B, vol. 39, pp. 1-38, 1977. [20] R. O. Duda, P. E. Hart, and D. G. Stork, “Pattern Classification”, John Wiley & Sons, Inc., 2000. [21] B. Efron, “The Jackknife, the bootstrap and other resampling plans”, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1982. [22] B. Efron and R. J. Tibshirani, "Improvements on cross-validation: The .632+ bootstrap method," Journal of the American Statistical Association, vol. 92, pp. 548-560, 1997. [23] Y. Ephraim, A. Dembo, and L. R. Rabiner, “A minimum discrimination information approach for hidden Markov modeling”, IEEE Trans. Information Theory, vol. 35, No. 5, pp. 1001-1013, September, 1989. [24] Y. Ephraim, H. Lev-Ari, and R. M. Gray, “Asymptotic minimum discrimination information measure for asymptotically weakly stationary process”, IEEE Trans. Information Theory, vol. 34, No. 5, pp. 1033-1040, September, 1988. [25] Y. E phraim, “Gain-adapted hidden Markov models for recognition of clean and noisy speech”, IEEE Trans. Signal Processing, vol.40, no. 6, pp. 1303-1316, 1992. [26] A. Erell and M. Weintrub, “Energy-conditioned spectral estimation for recognition of noisy speech”, IEEE Trans. Speech and Audio Processing, vol.1, no. 1, pp. 84-89, 1994. [27] G. D. Forney, “The Viterbi algorithm”, Proc. IEEE, vol. 61, pp268-278, March, 1973. [28] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-29, pp. 254—272, 1981. [29] S. Furui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum,” IEEE Trans. Acoust., Speech,Signal Processing, vol. ASSP-34, pp. 52—59, 1986. [30] K. Fukunaga, “Introduction to statistical pattern recognition”, Academic Press, New York, 2nd edition, 1990. [31] M. Gales and S. Young, “Cepstral parameter compensation for HMM recognition in noise”, Speech Communication, vol. 12, pp. 231-239, July 1993. [32] M. Gales and S. Young, “Robust speech recognition in additive and convolutional noise using parallel model combination”, Computer Speech and Language, vol. 9, pp. 289-307, 1995. [33] M. Gales and S. Young, “A fast and flexible implementation of parallel model combina-tion,” in Proc. ICASSP, 1995, pp. 131—136. [34] J.-L. Gauvain and C.-H. Lee, “ Bayesian learning for hidden Markov models with Gaussian mixture state observation Densities,” Speech Communication, vol. 11, No. 2-3, pp. 205-214, 1992. [35] J.-L. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Trans. Speech and Audio Processing, vol.2, pp. 291-298, 1994. [36] Y. Gong, “Speech recognition in noisy environments: A survey”, Speech Communication, vol. 16, pp. 261-291, 1995. [37] R. M. Gray, A. H. Gray, Jr., G. Rebolledo, and J. E. Shore, “Rate-distortion speech coding with a minimum discrimination information measure”, IEEE Trans. Information Theory, vol. 27, No. 6, pp. 708-721, Nov., 1981. [38] I. Guyon, J. Makhoul, R. Schwartz, and V. Vapnik. “ What size test set gives good error rate estimates?”, IEEE Trans. Pattern Recognition and Machine Intelligent, vol. 20, No. 1, pp.52-64., 1998. [39] J. Hansen. “Analysis and compensation of speech under stress and noise for en-vironmental robustness in speech recognition”. Speech Communication, vol., pp.20151—173, November 1996. [40] H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech”, Journal of the Acoustical Society of America, vol. 87, pp. 1738-1752, 1990 [41] H. Hermansky and N. Morgan, “RASTA processing of speech,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 578—589, 1994. [42] J. Hernando and C. Nadeu, “Speech recognition in noisy car environment based on OSALPC representation and robust similarity measuring techniques,” in Proc. ICASSP, 1994, pp. 69—72. [43] J.S. U. Hjorth, Computer Intensive Statistical Methods Validation, Model Selection, and Bootstrap, London: Chapman & Hall, 1994. [44] J. Holmes and N. Sedgwick, “Noise compensation for speech recognition using probabilistic models”, in Proc. ICASSP-86, 1986. [45] C. -S. Huang and D. Langmann, “Performance evaluation of adapted and retrained models for noisy speech recognition”, in Proc. Int. Sym. Chinese Spoken Language Processing, 1998, pp. 216-219. [46] C. -S. Huang and H. -C. Wang, “An SNR-incremental stochastic matching (SISM) algorithm for noisy speech recognition, in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, 1999, pp. 39-42. [47] C. -S. Huang and H. -C. Wang, “A divergence-based model separation”, in Proc. Int. Sym. Chinese Spoken Language Processing, 2000, pp. 231-234. [48] C. -S. Huang, H. -C. Wang, and C. -H. Lee, “An SNR-incremental stochastic matching algorithm for noisy speech recognition”, IEEE Trans. on Speech and Audio Processing, vol. 9, No. 8, pp. 866-873, Nov. 2001. [49] C. -S. Huang and H. -C. Wang, “Bandwidth-adjusted LPC analysis for robust speech recognition”, accepted by Pattern Recognition Letters, Oct. 2002. [50] J. -W. Hung, J. -L. Shen, and L. -S. Lee, “New approaches for domain transformation and parameter combination for improved accuracy in parallel model combination (PMC) techniques”, IEEE Trans. on Speech and Audio Processing, vol. 9, No. 8, pp. 842-855, Nov. 2001. [51] “IEEE Standard 610.12-1990”, IEEE standard glossary of software engineering terminology, ISBN 1-55937-067-X, 1990. [52] G. Kohn, “IEEE Standard Dictionary of Electrical and Electronics Terms”, IEEE, 6th ed., 1997. [53] A. K. Jain, R. C. Dubes, and C. -C. Chen, “Bootstrap techniques for error estimation”, IEEE Trans. Pattern Recognition and Machine Intelligent, vol. 9, No. 5, pp.628-633, 1998. [54] B. -H. Juang, “ Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains,” AT&T Technical Journal, vol. 64, 1985. [55] B. -H. Juang and L. R. Rabiner, “A probabilistic distance measure for hidden Markov models”, AT&T Technical Journal, vol. 64, No. 2, pp. 391-408, Feb. 1985. [56] B. -H. Juang, L. R. Rabiner, and J. G. Wilpon, “On the use of bandpass liftering in speech recognition”, IEEE Trans. Acoustic, Speech, Signal Processing, vol. 35, pp. 947-954, 1987. [57] B. -H. Juang, “Speech recognition in adverse environments”, Computer Speech and Language, vol 5, pp. 275-294, 1991. [58] B. -H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Trans. Signal Processing, vol.40, pp. 3043-3054, 1992. [59] B. -H. Juang, W. Chou, and C. -H. Lee, “Minimum classification error rate methods for speech recognition”, IEEE Trans. on Speech and Audio Processing, vol. 5, No. 3, pp. 257-265, May 1997. [60] Y. Kharin, “Robustness in Statistical Pattern Recognition”, Kluwer Academic Publishers, 1992. [61] S. Kullback, “Information Theory and Statistics”, New York: Dover, 1968. [62] C. -H. Lee, “On stochastic feature and model compensation approaches to robust speech recognition”, Speech Communication, vol. 25, pp. 29-47, August 1998. [63] K. F. Lee, Large-vocabulary Speaker-independent Continuous Speech Recognition: The SPHINX System, Ph. D. Thesis, Carnegie-Mellon University, 1988. [64] C. J. Legetter and P. C. Woodland, “ Flexible Speaker Adaptation using Maximum Likelihood Linear Regression”, ARPA Workshop on Spoken Language System Technology, pp. 110-115, 1995. [65] S. E. Levinson, L. R. Rabiner, and M. M. Sandhi, “ An introduction to the application of the theory of probabilitstic functions of a Markov process to automatic speech recognition”, The Bell System Technical Journal, vol. 62, pp.1035-1074, 1983. [66] J. Lim and A. Oppenhiem, “All-pole modeling of degraded speech”, IEEE Trans. On Acoustic, Speech, and Signal Processing, vol. 26, No. 3, pp.197-210, 1978. [67] L. R. Liporace, “ Maximum likelihood estimation for multivariate observations of Markov sources, “ IEEE Trans. Information Theory, vol. 28, pp. 729-734, 1982. [68] C. -S. Liu, et al, “A study on minimum error discriminative training for speaker recognition”, J. Acoust. Soc. Amer., vol 97, no. 1, pp637-648, 1995. [69] P. Lockwood and P. Alexandre, “Root adaptive homomorphic deconvolution schemes for speech recognition in noise”, In Proc. ICASSP-94, vol. 1, pp. 441-444, 1994. [70] D. Mansour and B. H. Juang, “The short-time modified coherence representation and noisy speech recognition,” IEEE Trans. Signal Processing, vol. 37, pp. 795—804, June 1989. [71] D. Mansour and B. -H. Juang, “A family of distortion measures based upon projection operation for robust speech recognition”, IEEE Trans. on Acoustic, Speech, and Signal Processing, vol. ASSP-37, No. 11, pp. 1659-1671, 1989. [72] P. J. Moreno, “Speech recognition in noisy environments”, Ph.D. Thesis, Carnegie Mellon Univ., May 1996. [73] P. J. Moreno, B. Raj, and R. M. Stern, “Data-driven environmental compensation for speech recognition: A unified approach,” Speech Commun., vol. 24, pp. 267—285, 1998. [74] A. Nadas, D. Nahamoo, and M. Picheny, “Speech recognition using noise-adaptive prototype”, in Proc. ICASSP-88, pp. 517-520, 1988. [75] “NIST speech quality assurance (SPQA) package”, version 2.4, National Institute of Standards and Technology (NIST), Nov. 1994. [76] J. A. Nolazco-Flores and S. J. Young, “Adapting a HMM based recognizer for noisy speech enhanced by spectral subtraction”, In Proc. Eurospeech’93, pp. 829-832, 1993. [77] J. A. Nolazco-Flores and S, J. Young, “Continuous speech recognition in noise using spectral subtraction and HMM adaptation”, In Proc. ICASSP-94, vol. 1, pp. 409-412, 1994. [78] L. R. Rabiner, B. -H. Juang, S. E. Levinsion, and M. M. Sondhi, “Recognition of isolated digits using hidden Markov models with continuous mixture density”, AT&T Technical Journal, vol. 64. pp1211-1234, 1985. [79] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition”, Proc. IEEE, vol. 77, pp. 257-286, Feb. 1989. [80] L. R. Rabiner and B. -H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993. [81] L. R. Rabiner, B. -H. Juang, and C. -H. Lee, “An overview of automatic speech recognition”. In C. -H Lee, F. K. Song and K. K. Paliwal (Eds.), “Automatic Speech and Speaker Recognition: Advanced Topics”, Chapter 1, Kluwer Academic Publishers, 1996. [82] M. Rahim and C. -H. Lee, "Simultaneous feature and HMM design using string-based minimum classification error training criterion," Proc. ICSLP-96, pp. 1820-1823, Philadelphia, Oct. 1996. [83] D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Trans. Speech and Audio Processing, vol.3, pp. 72-83, 1995. [84] R. Rose, E. M. Hofstetter, and D. A. Reynolds, “Integrated models of speech and background with applications to speaker identification in noise”, IEEE Trans. on Speech and Audio Processing, vol. 2, pp. 245-257, 1994. [85] A. E. Rosenberg, J. Delong, C. -H. Lee, B. -H. Juang, and F. K. Soong, “The use of cohort normalized scores for speaker recognition”, Proc. ICSLP-92, Banff, pp.599-602, Oct. 1992. [86] A. Sankar and C. -H. Lee, “A maximum likelihood approach to stochastic matching for robust speech recognition”, IEEE Trans. on Speech and Audio Processing, vol. 4, No. 3, pp. 190-202, May 1996. [87] O. Siohan, and C. -H. Lee, “Iterative noise and channel estimation under the stochastic matching algorithm framework”, IEEE Signal Processing Letter, vol. 4, No. 11, pp. 304-306, November 1997. [88] F. K. Soong and A. E. Rosenberg, “On the use of instantaneous and transitional spectral information in speaker recognition,” IEEE Trans. Acoust. Speech Signal Process, vol. 36, 1988. [89] R. M. Stern, A. Acero, F. -H Liu, and Y. Ohshima, “Signal processing for robust speech recognition”. In C. -H Lee, F. K. Song and K. K. Paliwal (Eds.), “Automatic Speech and Speaker Recognition: Advanced Topics”, Chapter 15, Kluwer Academic Publishers, 1996. [90] W. -H. Tsai, “Automatic identification and indexing of Chinese multilingual spoken language”, Ph. D. Dissertation, Institute of Communication Engineering, National Chiao Tung University, Hsinchu, Taiwan, May, 2001. [91] A. Varga, R. Moore, J. Bridle, K. Ponting, and M. Russell, “Noise compensation algorithms for use with hidden Markov model based speech recognition”, in Proc. ICASSP-88, pp. 481-484, 1988. [92] A. Varga and R. Moore, “Hidden Markov model decomposition for speech and noise”, in Proc. ICASSP-90, pp. 845-848, 1990. [93] A. Varga and H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems”, Speech Communication, vol. 12, pp. 247-251, 1993. [94] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically optimal decoding algorithm”, IEEE Trans. Information Theory, IT-13, pp260-269, April 1967.
|