|
Abromovitz, M., Stegun, I. A., 1965. Handbook of Mathematical Functions. New York: Dover Publications, Inc. Anastasakos, A., Schwartz, R., Shu, H., 1995. Duration modeling in large vocabulary speech recognition. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 628-631. Bahl, L., Brown, P., de Souza, P., Mercer, R., 1986. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 11, pp. 49-52. Burshtein, D., 1996. Robust parametric modeling of durations in hidden Markov models. IEEE Transactions on Speech and Audio Processing, vol. 4, no. 3, pp. 240-242. Chen, K. T., Liau, W. W., Wang, H. M., Lee, L. S., 2000. Fast speaker adaptation using eigenspace-based maximum likelihood linear regression. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 742-745. Chengalvarayan, R., 1998. Speaker adaptation using discriminative linear regression on time-varying mean parameters in trended HMM. IEEE Signal Processing Letters, vol. 5, pp. 63-65. Chesta, C., Siohan, O., Lee, C.-H., 1999. Maximum a posteriori linear regression for hidden Markov model adaptation. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 1, pp. 211-214. Chien, J.-T., 1999. Online hierarchical transformation of hidden Markov models for speech recognition. IEEE Transactions on Speech and Audio Processing, vol. 7, no. 6, pp. 656-667. Chien, J.-T., 2002. Quasi-Bayes linear regression for sequential learning of hidden Markov models. IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, pp. 268-278. Chien, J.-T., 2003. Linear regression based Bayesian predictive classification for speech recognition. IEEE Transactions on Speech and Audio Processing, vol. 11, no. 1, pp. 70-79. Chien, J.-T., Huang, C,-H., 2003. Bayesian learning of speech duration models. IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 558-567. Chien, J.-T., Liao, G.-H., 2001. Transformation-based Bayesian predictive classification using online prior evolution. IEEE Transactions on Speech and Audio Processing, vol. 9, no. 4, pp. 399-410. Chou, W., 1999. Maximum a posteriori linear regression with elliptically symmetric matrix variate priors. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 1, pp. 1-4. Chou, W., Juang, B.-H., 2003. Pattern Recognition in Speech and Language Processing, CRC Press. Chow, Y.-L., 1990. Maximum mutual information estimation of HMM parameters for continuous speech recognition using the N-best algorithm. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 701-704. DeGroot, M. H., 1970. Optimal Statistical Decisions, McGraw-Hill. Dempster, P., Laird, N. M., Rubin, D. B., 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society (B), vol. 39, pp. 1-38. Dong, R., Zhu, J., 2002. One use of duration modeling for continuous digits speech recognition. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 385-388. Duda, R. O., Hart, P. E., Stork, D. G., 2001. Pattern Classification, John Wiley & Sons, Inc. Fabian, T., Pfau, T., Ruske, G., 2001. Analysis of N-best output hypotheses for fast speech in large vocabulary continuous speech recognition. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 4, pp. 2535-2538. Faltlhauser, R., Ruske, G., Thomae, M., 2002. Towards the question: why has speaking rate such an impact on speech recognition performance. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 2429-2432. Ferguson, J. D., 1980. Variable duration models for speech. Proceedings of Symposium on the Application of Hidden Markov Models to Text and Speech, pp. 143-179. Gao, S., Lee, C.-H., 2003. A discriminative decision tree learning approach to acoustic modeling. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1833-1836. Gauvain, J. L., Lee, C.-H., 1994. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, vol. 2, pp. 291-298. Gillick, L., Cox, S. J., 1989. Some statistical issues in the comparison of speech recognition algorithms. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 532-535. Gopalakrishnan, P. S., Kanevsk, D., Nádas, A., Nahamoo, D., 1991. An inequality for rational function with applications to some statistical estimation problem. IEEE Transactions on. Information Theory, vol. 37, pp. 107-113. Gu, H., Tseng, C., Lee, L., 1991. Isolated-utterance speech recognition using hidden Markov models with bounded state durations. IEEE Transactions on Signal Processing, vol. 39, no. 8, pp. 1743-1752. Gunawardana, A., Byrne, W., 2001. Discriminative speaker adaptation with conditional maximum likelihood linear regression. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1203-1206. He, X., Chou, W., 2003. Minimum classification error linear regression for acoustic model adaptation of continuous density HMMs. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 556-559. Huang, C.-H., Chien, J-T., 2005. Aggregate a posteriori linear regression for speaker adaptation. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 973-976. Hung, W.-W., Wang, H.-C., 1997. HMM retraining based on state duration alignment for noisy speech recognition. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 3, pp. 1519-1522. Huo, Q., Lee, C.-H., 1997. On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate. IEEE Transactions on Speech and Audio Processing, vol. 5, no. 2, pp. 161-172, March. Jeanrenaud, P., Eide, E., Chaudhari, U., McDonough, J., Ng, K., Siu, M., Gish, H., 1995. Reducing word error rate on conversational speech from the Switchboard corpus. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 53-56. Juang, B.-H., Chou, W., Lee, C.-H., 1997. Minimum classification error rate methods for speech recognition. IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 257-265. Kapadia, S., Valtchev, V., Young, S. J., 1993. MMI training for continuous phoneme recognition on the TIMIT database. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 2, pp. 491-494. Kuhn, R., Junqua, J.-C., Nguyen, P. and Niedzielski, N., 2000. Rapid speaker adaptation in eigenvoice space. IEEE Transactions on Speech and Audio Processing, vol. 8, pp. 695-707. Kuo, H.-K. J., Fosle-Lussier, E., Jiang, H., Lee, C.-H., 2000. Discriminative training of language models for speech recognition. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 325-328. Kuwabara, H., 1997. Acoustic and perceptual properties of phonemes in continuous speech as a function of speaking rate. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1003-1006. Lai, W.-H., Chen, S.-H., 2002. Analysis of syllable duration models for Mandarin speech. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 497-500. Leggetter, C. J., Woodland, P. C., 1995. Maximum likelihood regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, vol. 9, pp. 171-185. Levinson, S. E., 1986. Continuously variable duration hidden Markov models for automatic speech recognition. Computer Speech and Language, vol. 1, pp. 29-45. Li, Q., Juang, B.-H., 2002. A new algorithm for fast discriminative training. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 97-100. Li, Q., Juang, B.-H., 2003. Fast discriminative training for sequential observations with application to speaker identification. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 2, pp. 397-400. Liu, C.-S., Lee, C.-H., Chou, W., Juang, B.-H., Rosenberg, A., 1995. A study on minimum error discriminative training for speaker recognition. The Journal of the Acoustic Society of America, vol. 97, no. 1. pp. 637-648. Ljolje, A., 1994. The importance of cepstral parameter correlations in speech recognition. Computer Speech Language, vol. 8, pp. 223-232. Mirghafori, N., Fosler, E., Morgan, N., 1996. Towards robustness to fast speech in ASR. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 335-338. Morgan, N., Fosler, E., Mirghafori, N., 1997. Speech recognition using on-line estimation of speaking rate. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 2079-2082. Nádas, A., 1983. A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 31, no. 4, pp. 814-817. Nguyen, P., et al., 1999. Maximum likelihood eigenspace and MLLR for speech recognition in noisy environments. in Proc. Eurospeech, pp. 2519-2522. Normandin, Y., Cardin, R., De Mori, R., 1994. High-performance connected digit recognition using maximum mutual information estimation. IEEE Transactions on Speech and Audio Processing, vol. 2, pp. 299-311. Povey, D., Woodland, P. C., 2002. Minimum phone error and I-smoothing for improved discriminative training. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 105-108. Power, K., 1996. Duration modeling for improved connected digit recognition. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 885-888. Rabiner, L. R., Juang, B.-H., 1993. Fundamentals of Speech Recognition, Prentice-Hall, Inc. Rabiner, L. R., Juang, B.-H., Levinson, S. E., Sondhi, M. M., 1985. Recognition of isolated digits using hidden Markov models with continuous mixture densities. AT&T Technical Journal, vol. 64, no. 6, pp. 1211-1234. Rahim, M. G., Lee, C.-H., Juang, B.-H., 1997. Discriminative utterance verification for connected digit recognition. IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 266-277. Reichl, W., Ruske, G., 1995. Discriminative training for continuous speech recognition. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 1, pp. 537-540. Richardson, M., Hwang, M., Acero, A., Huang, X. D., 1999. Improvement on speech recognition for fast talkers. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 1, pp. 411-414. Russell, M. J., Moore, R. K., 1985. Explicit modeling of state occupancy in hidden Markov models for automatic speech recognition. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 5-8. Saul, L., Rahim, M., 1999. Modeling the rate of speech by Markov processes on curves. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 1, pp. 415-418. Siegler, M. A., Stern, R. M., 1995. On the effects of speech rate in large vocabulary speech recognition systems. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 612-615. Suaudeau, N., Andre-Obrecht, R., 1994. An efficient combination of acoustic and supra-segmental informations in a speech recognition system. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 65-68. Summerfield, Q., 1981. On articulatory rate and perceptual constancy in phonetic perception. Journal of Experimental Psychology and Human Performance, vol. 7, pp. 1074-1095. Tuerk, A., Young, S., 1999. Modelling speaking rate using a between frame distance metric. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 1, pp. 419-422. Turk, M. A., Pentland, A. P., 1991. Face Recognition Using Eigenfaces. Proceedings of IEEE Conference of Computer Vision and Pattern Recognition, pp. 586-591. Verhasselt, J. P., Martens, J.-P., 1996. A fast and reliable rate of speech detector. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 2258-2261. Wang, H.-M., 2003. MATBN 2002: A Mandarin Chinese broadcast news corpus. in Proc. ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo. Wang, X., Pols, L. C. W., ten Bosch, L. F. M., 1996. Analysis of context-dependent segmental duration for automatic speech recognition. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 1181-1184. Wang, L., Woodland, P. C., 2004. MPE-based discriminative linear transform for speaker adaptation. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 312-324. Ward, N., Nakagawa, S., 2002. Automatic user-adaptive speaking rate selection for information delivery. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 549-552. Woodland, P. C., 1999. Speaker adaptation: Techniques and challenges. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pp.85-90. Woodland, P. C., Povey, D., 2000. Large scale discriminative training for speech recognition. Proc. ITRW ASR, ISCA. Wu, J., Huo, Q., 2002. Supervised adaptation of MCE-trained CDHMMs using minimum classification error linear regression. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 605-608. Yoma, N. B., McInnes, F. R., Jack, M. A., Stump, S. D., Ling, L. L., 2001. On including temporal constraints in Viterbi alignment for speech recognition in noise. IEEE Transactions on Speech and Audio Processing, vol. 9, no. 2, pp. 179-182. Yoma, N. B., Sanchez, J. S., 2002. MAP speaker adaptation of state duration distributions for speech recognition. IEEE Transactions on Speech and Audio Processing, vol. 10, no. 7, pp. 443-450. Zilca, R. D., 2002. Text-independent speaker verification using utterance level scoring and covariance modeling. IEEE Transactions on Speech and Audio Processing, vol.10, no.6, pp.363-370.
|