|
Akaike, H., “A new look at the statistical model identification”, IEEE Transactions on Automatic Control, vol. 19, no. 6, pp. 716-723, 1974. Anderson, T. W., “Asymptotic theory for principal component analysis”, Annals of Mathematical Statistics, vol. 34, pp.122-148, 1963. Anderson, T. W., Introduction to Multivariate Statistical Analysis 2nd Edition, New York: Wiley, 1984. Attias, H., “Independent Factor Analysis”, Neural Computation, vol. 11, no.4, pp. 803-851, 1999. Basilevsky, A., Statistical Factor Analysis and Related Methods - Theory and Applications, John Wiley & Sons, 1994. Biem, A., Ha, J.-Y. and Subrahmonia, J., “A Bayesian model selection criterion for HMM topology optimization”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), vol. 1, pp. 13-17, 2002. Boll, S. F., “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Transactions on Acoustic, Speech and Signal Processing, vol. ASSP-27, pp. 113–120, 1979. Bourland, H. and Dupont, S., “A new ASR approach based on independent processing and recombination of partial frequency bands”, Proc. of International Conference on Spoken Language Processing (ICSLP), pp. 426-429, 1996. Box, G. E. P., “A general distribution theory for a class of likelihood criteria”, Biometrika, vol. 36, pp.317-346, 1949. Campbell, M. W., Assaleh, K. T., and Brown, C. C., “Speaker recognition with polynomial classifiers”, IEEE Transactions on Speech and Audio Processing, vol. 10, no. 4, pp. 205-212, 2002. Chien, J.-T., “Online hierarchical transformation of hidden Markov models for speech recognition”, IEEE Transactions on Speech and Audio Processing, vol. 7, no. 6, pp. 656-667, 1999. Chien, J.-T., “Decision tree state tying using cluster validity criteria”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 2, pp. 182-193, 2005. Chien, J.-T. and Chen, B.-C., “A new independent component analysis for speech recognition and recognition”, IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 4, pp. 1245-1254, 2006. Chien, J.-T. and Furui, S., “Predictive hidden Markov model selection for speech recognition”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 3, pp. 377-387, 2005. Chien, J.-T. and Huang, C.-H., “Bayesian learning of speech duration models”, IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 558-567, 2003. Chein, J.-T. and Liao, C.-P., “Maximum confidence hidden Markov modeling for face recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 606-616, 2008. Chien, J.-T. and Ting, C.-W., “Speaker identification using probabilistic PCA model selection”, Proc. of International Conference on Spoken Language Processing (ICSLP), vol. 3, pp. 1785-1788, 2004. Chein, J.-T. and Ting, C.-W., “Subspace modeling and selection for noisy speech recognition”, Proc. of International Conference on Spoken Language Processing (INTERSPEECH), pp. 789-792, 2006. Chien, J.-T. and Ting, C.-W., “Factor analyzed subspace modeling and selection”, IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 1, pp. 239-248, 2008. Chien, J.-T. and Ting, C.-W., “Acoustic factor analysis for streamed hidden Markov modeling”, IEEE Transactions on Audio, Speech and Language Processing, vol. 17, no. 7, pp. 1279-1291, 2009. Dempster, A. P., Laird, N. M., and Robin, D. B., “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society (B), vol. 39, no. 1, pp. 1-38, 1977. Deoras, A. N. and Hasegawa-Johnson, M., “A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), vol. 1, pp.861-864, 2004. Droppo, J. and Acero, A., “Maximum mutual information SPLICE transform for seen and unseen conditions”, Proc. of European Conference on Speech Communication and Technology (INTERSPEECH), pp. 989-992, 2005. Droppo, J., Deng, L., and Acero, A., “Evaluation of the SPLICE algorithm on the Aurora2 database”, Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 217-220, 2001. Dupont, S. and Luettin, J., “Audio-visual speech modeling for continuous speech recognition”, IEEE Transactions on Multimedia, vol. 2, no. 3, pp. 141-151, 2000. Ephraim, Y. and Malah, D., “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator”, IEEE Transactions on Acoustic, Speech and Signal Processing, vol. ASSP-32, no. 6, pp.1109-1121, 1984. Ephraim, Y. and Van Trees, H. L., “A signal subspace approach for speech enhancement”, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, 1995. Falkhausen, M., Reininger, H., and Wolf, D., “Calculation of distance measures between hidden Markov models”, Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1487-1490, 1995. Furao, S. and Hasegawa, O., “An incremental network for on-line unsupervised classification and topology learning”, Neural Networks, vol. 19, pp. 90-106, 2006. Furui, S., “Recent advances in speaker recognition”, Pattern Recognition Letters, vol. 18, pp. 859-872, 1997. Gales, M. J. F., “Maximum likelihood linear transformations for HMM-based speech recognition”, Computer Speech and Language, vol. 12, no. 2, pp. 75-98, 1998. Gales, M. J. F. and Young, S. J., “Robust continuous speech recognition using parallel model combination”, IEEE Transactions on Speech and Audio Processing, vol. 4, no. 5, pp. 352-359, 1996. Gauvain, J.-L. and Lee, C.-H., “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains”, IEEE Transactions on Speech and Audio Processing, vol. 2, no. 2, pp.291-298, April 1994. Ghahramani, Z. and Jordan, M. I., “Factorial hidden Markov models”, Machine Learning, 29, pp. 245-275, 1997. Hämäläinen, A., Bosch, L., and Boves, L., “Modeling pronunciation variation using multi-path HMMs for syllables”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp. 781-784, 2007. Haykin, S., Neural Networks: A Comprehensive Foundation 2nd Edition, Prentice Hall, 1998. He, J., Liu, L., and Gunther, P., “A discriminative training algorithm for VQ-based speaker identification”, IEEE Transactions on Speech and Audio Processing, vol. 7, no. 3, pp. 353-356, 1999. Hershey, J. R. and Olsen, P. A., “Approximating the Kullback-Leibler divergence between Gaussian mixture models”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp. 317-320, 2007. Hirsch, H. G. and Pearce, D., “The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions”, Proc. of ISCA ITRW ASR2000, Paris-France, September 2000. Hu, Y. and Loizou, P. C., “A generalized subspace approach for enhancing speech corrupted by colored noise”, IEEE Transactions on Speech and Audio Processing, vol. 11, no. 4, pp. 334-341, 2003. Hwang, M.-Y. and Huang, X., “Dynamically configurable acoustic models for speech recognition”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp. 669-672, 1998. Jitsuhiro, T. and Nakamura, S., “Variational Bayesian approach for automatic generation of HMM topology”, Proc. of IEEE Automatic Speech Recognition and Understanding Workshop, pp. 77-82, 2003. Jolliffe, I. T., Principal Component Analysis, Springer-Verlag, 1986. Kim, H.-C., Kim, D., and Bang, S.-Y., “Extensions of LDA by PCA mixture model and class-wise features”, Pattern Recognition vol. 36, pp. 1095-1105, 2003. Kumar, N. and Andreou, A. G., “Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition”, Speech Communication, vol. 26, no. 4, pp. 283-297, 1998. Lamel, L., Kassel, R., and Seneff, S., “Speech database development: design and analysis of the acoustic-phonetic corpus”, Proc. of the DARPA Speech Recognition Workshop, pp. 100-109, 1986. Lee, K. F. and Hon, H. W., “Speaker-independent phone recognition using hidden Markov models”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 11, pp. 1641-1648, 1989. Leggetter, C. J. and Woodland, P. C., “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models”, Computer Speech and Language, vol. 9, pp.171-185, 1995. Logan, B. and Moreno, P., “Factorial HMMs for acoustic modeling”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp.813-816, 1998. Mackay, D. J. C., “Bayesian interpolation”, Neural Computation, vol. 4, pp. 405-447, 1992. Mak, B. and Chan, K.-W., “Pruning hidden Markov models with optimal brain surgeon”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 993-1003, 2005. Markov, K. and Nakamura, S., “Never-ending learning with dynamic hidden Markov network”, Proc. of European Conference on Speech Communication and Technology (INTERSPEECH), pp.1437-1440, 2007. Merhav, N., “The estimation of the model order in exponential families”, IEEE Transactions on Information Theory, vol. 35, no. 5, pp. 1109-1114, 1989. Nadas, A., Nahamoo, D., and Picheny, M. A., “Speech recognition using noise-adaptive prototypes”, IEEE Transactions on Acoustic, Speech, and Signal Processing, vol. 37, no. 10, pp. 1495-1503, 1989. Nagao, H., “On some test criteria for covariance matrix”, The Annals of Statistics, vol. 1, no. 4, pp. 700-709, 1973. Ostendorf, M. and Singer, H., “HMM topology design using maximum likelihood successive state splitting”, Computer Speech and Language, vol. 11, pp. 17-41, 1997. Printz, H. and Olsen, P., “Theory and practice of acoustic confusability”, Proc. of ISCA ITRW ASR2000, pp. 77-84, 2000. Rabiner, L. R. and Juang, B.-H., Fundamentals of Speech Recognition, Englewood Cliffs, NJ: Prentice-Hall, 1993. Rencher, A. C., Methods of Multivariate Analysis, John Wiley & Sons, 1995. Reyes-Gomez, M. J., Raj, B., and Ellis, D. P. W., “Multi-channel source separation by factorial HMMs”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp. 664-667, 2003. Reynolds, D. A., “Speaker identification and verification using Gaussian mixture speaker models”, Speech Communication, vol. 17, pp. 91-108, 1995. Reynolds, D. A., and Rose, R.C., “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Transactions on Speech and Audio Processing, vol. 3, pp. 72-83, 1995. Rissanen, J., “A universal prior for integers and estimation by minimum description length”, The Annals of Statistics, vol. 11, no. 2, pp. 416-431, 1983. Roch, M. and Hurtig, R. R., “The integral decode: a smoothing technique for robust HMM-based speaker recognition”, IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, pp. 315-324, 2002. Rosti, A.-V. I. and Gales, M. J. F., “Factor analyzed hidden Markov models for speech recognition”, Computer Speech and Language, vol. 18, no. 2, pp. 181-200, 2004. Saul, L. K. and Rahim, M. G., “Maximum likelihood and minimum classification error factor analysis for automatic speech recognition”, IEEE Transactions on Speech and Audio Processing, vol. 8, no. 2, pp. 115-125, 2000. Schwarz, G., “Estimating the dimension of a model”, The Annals of Statistics, vol. 6, no. 2, pp. 461-464, 1978. Silva, J. and Narayanan, S., “Average divergence distance as a statistical discrimination measure for hidden Markov models”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 3, pp. 890-906, 2006. Singh, R., Raj, B. and Stern, R., “Structured redefinition of sound units by merging and splitting for improved speech recognition”, Proc. of International Conference on Spoken Language Processing (ICSLP), 2000. Srivastava, M. S., Methods of Multivariate Statistics, John Wiley & Sons, 2002. Takami, J. and Sagayama, S., “A successive state splitting algorithm for efficient allophone modeling”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), vol. 1, pp. 573-576, 1992. Ting, C.-W. and Chien, J.-T., “Factor analysis of acoustic features for streamed hidden Markov modeling”, Proc. of IEEE Automatic Speech Recognition and Understanding Workshop, pp. 30-35, 2007. Ting, C.-W. and Chien, J.-T., “Factor analyzed HMM topology for speech recognition”, Proc. of International Conference on Spoken Language Processing (INTERSPEECH), 2009. Ting, C.-W., Lee, K.-Y., and Chien, J.-T., “Adaptive HMM topology for speech recognition”, Proc. of International Conference on Spoken Language Processing (INTERSPEECH), pp. 127-1240, 2008. Tipping, M. E. and Bishop, C. M., “Mixtures of probabilistic principal component analyzers”, Neural Computation, vol. 11, pp. 443-482, 1999. Varga, A. P. and Moore, R. K., “Hidden Markov model decomposition of speech and noise”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp. 845-848, 1990. Vasko Jr., F. C., El-Jaroudi, A., and Boston, J. R., “An algorithm to determine hidden Markov topology”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), vol. 6, pp. 3577-3580, 1996. Vertanen, K., “Baseline WSJ acoustic models for HTK and SPHINX: training recipes and recognition experiments”, Technical Report, Cavendish Laboratory, 2006. Vetter, R., Virag, N., Renevey, P., and Vesin, J.-M., “Single channel speech enhancement using principal component analysis and MDL subspace selection”, Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 2411-2414, 1999. Vihola, M., Harju, M., Salmela, P., Suontausta, J., and Savela, J., “Two dissimilarity measures for HMMs and their application in phoneme model clustering”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp. 933-936, 2002. Virtanen, T., “Speech recognition using factorial hidden Markov models for separation in the feature space”, Proc. of International Conference on Spoken Language Processing (INTERSPEECH), pp.89-92, 2006. Wang, W. and O’Shaughnessy, D., “Noise adaptation for robust AURORA 2 noisy digit recognition using statistical data mapping”, Proc. of International Conference on Spoken Language Processing (ICSLP), pp. 125-128, 2004. Watanabe, S., Sako, A., and Nakamura, A., “Automatic determination of acoustic model topology using variational Bayesian estimation and clustering for large vocabulary continuous speech recognition”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 3, pp. 855- 872, 2006. Wu, J. and Huo, Q., “An environment compensated minimum classification error training approach and its evaluation on Aurora2 database”, Proc. of International Conference on Spoken Language Processing (ICSLP), pp. 453-457, 2002. Xu, M. and Golay, M. W., “Data-guided model combination by decomposition and aggregation”, Machine Learning, vol. 63, pp. 43-67, 2006. Yapanel, U., Hansen, J. H. L., Sarikaya, R., and Pellom, B., “Robust digit recognition in noise: an evaluation using the AURORA corpus”, Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 209-212, 2001. Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland, P., The HTK Book, Cambridge University Speech Group, 2000.
|