跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.81) 您好!臺灣時間:2024/12/15 05:00
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃志賢
研究生(外文):Chih-Hsien Huang
論文名稱:貝氏鑑別性語者調適於大詞彙連續語音辨識
論文名稱(外文):Bayesian Discriminative Speaker Adaptation for Large Vocabulary Continuous Speech Recognition
指導教授:簡仁宗簡仁宗引用關係
指導教授(外文):Jen-Tzung Chien
學位類別:博士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:英文
論文頁數:108
中文關鍵詞:聚集事後機率線性迴歸近似貝氏估測適應性區間模型語者調適鑑別性學習貝氏學習大詞彙連續語音辨識
外文關鍵詞:LVCSRBayesian LearningDiscriminative LearningSpeaker AdaptationAdaptive Duration ModelQuasi-Bayes EstimateAAPLR
相關次數:
  • 被引用被引用:0
  • 點閱點閱:256
  • 評分評分:
  • 下載下載:32
  • 收藏至我的研究室書目清單書目收藏:1
  以最少人工介入的自動化語音辨識以至於自動摘要是當前語音辨識研究最大的目標。在研究室中的語音辨識效能由於訓練語料與測試語料皆來自於同一語音資料庫,所以辨識率一般而言皆相當高。然而,在實際應用環境中,測試語料是從現場收集所得,所以與訓練語料之間必然存在著某些差異,諸如:語速、環境噪音、錄音設備之通道效應等等。所以如何補償這些差異,最廣為使用的就是語者/環境的調適機制。為了迅速同時有效地改進辨識效能,我們所需要的辨識器最好具備快速、循序及具鑑別性的調適能力。

  本論文提出一系列具貝氏暨鑑別性調適機制之語者調適演算法。最主要的目的在於將起初所訓練好的語音模型調適到與目前測試環境較符合之特性以有效提高語音辨識效能。論文主要針對三種不同的語音參數進行調適。首先是針對語音區間模型。對於語音區間模型我們提出了以貝氏為主的區間模型與調適演算法。主要的重點是放在以隱藏式馬可夫模型為主之區間模型參數之基於近似貝氏估測所推導出之線上調適機制。調適後的語音區間模型對於不穩定之語者語速與雜訊環境皆具有強健性。在本項研究部份,高斯、卜瓦松與迦瑪分佈皆用以描述語音區間模型。同時我們亦發展出以最大事後機率為主之迦瑪區間模型。為完成語音模型的線上學習機制,我們採用了以卜瓦松為主的區間模型配合屬於同一共軛事前機率分配家族之迦瑪事前機率分配。此一機制的優點之一在於當調適語料一筆筆進來時,我得以決定出最佳的近似貝氏區間模型參數,同時另一項優點是可以建立以迦瑪事前機率統計量為主之線上更新機制。在本研究中我們是透過EM演算法以完成近似貝氏之參數估測機制的。透過傳統之語音參數與區間模型參數之同步調適,我們可以確實提升語音辨識的效能。

  在以特徵聲音為主的快速語者調適研究上,我們也提出了兩個改進方案。首先,我們提出一個以最大事後機率為主之特徵分解演算法,其中所需要估測之線性組合係數我們根據最大事後機率準則進行估測。在此,我們使用高斯分佈以描述這些組合係數之可能分佈,採用此一方法確實可達到較傳統提出之最大相似度之特徵分解調適演算法。另一方面,我們再提出一個基於特徵聲音調適架構下之變異數矩陣調適演算法。此方法主要是採用主成份分析法將語者相依的語音模型轉換到一較小的正交參數空間。然後,再該空間中估測出共變異數矩陣。最後調適過的共變異數矩陣則透過將較小正交參數空間所求得之矩陣進行轉換而得。

  除了之前所述兩種基於貝氏理論之調適演算法,我們更提出另一個具備貝氏與鑑別性調適機制之演算法。此演算法是透過群集為主之回歸矩陣參數估測以完成語者調適機制。回歸矩陣參數之調適乃是透過最大化聚集事後機率之準則。該準則是可以表示為一採用對數化的事後機率分佈作為鑑別函式之分類錯誤函式。於是透過對調適語料的分類錯誤最小化的最大化聚集事後機率線性回歸調適演算法便隨之建立。由於考慮了回歸矩陣的事前機率分配,所以本演算法具備了貝氏學習能力。同時,我們更展現了本演算法與最大事後機率線性回歸調適演算法的差異是在於對於輸入語料的觀測機率的處理上。不同於一般的最小分類錯誤線性回歸調適演算法,本方法具有封閉解以達到快速調適的要求。實驗結果上也展現出最大聚集事後機率線性回歸語者調適演算法的確可以比最大相似度線性回歸、最大事後機率線性回歸、最小分類錯誤線性回歸及有條件的最大相似度線性回歸演算法,再額外花費一些多餘的計算量之外,改進了辨識效能。
 
  在實驗中,我們在大詞彙連續語音辨識系統中檢驗漸進式與鑑別性之語者調適演算法。我們採用中文廣播新聞語料資料庫以進行效能評估。廣播新聞語料乃是收集了以自發性說話風格的語音資料,同時也是公認為語音辨識應用上最具挑戰性的課題。同時,由於詞彙量大增以致於傳統的完整搜尋演算法是無法在實作上派上用場。於是,我們使用了複製樹搜尋演算法以實作出具較少計算量與較少記憶體耗費的大詞彙連續語音搜尋方式。由實驗結果觀之,使用我們所提出的貝氏區間模型調適、貝氏特徵聲音調適以及聚集事後機率線性回歸鑑別性調適的確可以改進語音辨識效能。
  Automatic recognition and summarization of continuous speech with high performance is the aim of many speech related researches. In the laboratory, the recognition performance was generally good because the training and test data are from the same environmental condition. However, there are many mismatch sources between training and test data in real applications, such as speaking rate, environment noise, channel effect, etc. To deal with these mismatches, the most popular technique is to conduct speaker/environment adaptation. More specifically, it is desirable that the continuous speech recognition system is equipped with rapid, sequential and discriminative capabilities.

  This dissertation proposes a series of Bayesian and discriminative adaptation methods for large vocabulary continuous speech recognition (LVCSR). We aim to adapt the speaker-independent models to the test environment/speaker. The first method is to deal with mismatch of duration models. The Bayesian speech duration modeling and learning is presented for hidden Markov model (HMM) based speech recognition. We focus on the sequential learning of HMM state duration using quasi-Bayes (QB) estimate. The adapted duration models are robust to nonstationary speaking rates and noise conditions. In this study, the Gaussian, Poisson and gamma distributions are investigated to characterize the duration models. The maximum a posteriori (MAP) estimate of gamma duration model is developed. To exploit the sequential learning, we adopt the Poisson duration model incorporated with gamma prior density, which belongs to the conjugate prior family. When the adaptation data are sequentially observed, the gamma posterior density is produced with twofold advantages. One is to determine the optimal QB duration parameter, which can be merged in HMM’s for speech recognition. The other one is to build the updating mechanism of gamma prior statistics for sequential learning. EM algorithm is applied to fulfill QB parameter estimation. The adaptation of overall HMM parameters can be performed simultaneously.

  Also, we present two approaches to improve performance of eigenvoice-based speaker adaptation. First, we present the maximum a posteriori eigen-decomposition (MAPED), where the linear combination coefficients for eigenvector decomposition are estimated according to MAP criterion. By incorporating the prior decomposition knowledge using Gaussian distribution, the MAPED is established so as to achieve better performance than maximum likelihood eigen-decomposition (MLED) when adaptation data is few. On the other hand, we exploit the adaptation of HMM covariance matrices in framework of eigenvoice speaker adaptation. Our method is to use the principal component analysis (PCA) to project the speaker-specific HMM parameters onto a smaller orthogonal model space. Then, we reliably calculate the HMM covariance matrices using the observations in the reduced space. The HMM covariance matrices are then estimated by transforming the covariance matrices in reduced space back to the original space.

  To establish Bayesian discriminative adaptation, we also present a new linear regression adaptation algorithm for LVCSR. The cluster-dependent regression matrices are estimated from speaker-specific adaptation data through maximizing the aggregate a posteriori probability, which can be expressed in a form of classification error function adopting the logarithm of posterior distribution as the discriminant function. Accordingly, the aggregate a posteriori linear regression (AAPLR) is developed for discriminative adaptation where the classification errors of adaptation data are minimized. Because the prior distribution of regression matrix is involved, AAPLR is geared with the Bayesian learning capability. We demonstrate that the difference between AAPLR discriminative adaptation and maximum a posteriori linear regression (MAPLR) adaptation is due to the treatment of the evidence. Different from minimum classification error linear regression (MCELR), AAPLR has closed-form solution to fulfill rapid adaptation. The proposed AAPLR speaker adaptation is investigated by comparing with maximum likelihood linear regression (MLLR), MAPLR, MCELR and conditional maximum likelihood linear regression (CMLLR).

  In the experiments, we examine the incremental and discriminative speaker adaptation algorithms for large vocabulary continuous speech recognition. We adopt Mandarin broadcast news databases for system evaluation. Broadcast news speech data are collected with spontaneous speaking style which is known as the most challenging task among speech recognition applications. Also, the vocabulary size is so large that full search algorithm is not useful in real implementation. We apply tree copy search algorithm to implement a fewer computation and less storage search for LVCSR. From experimental results, we do improve recognition performance using the proposed Bayesian duration adaptation, Bayesian eigenvoice adaptation and AAPLR discriminative adaptation.
中文摘要 iii
Abstract v
誌謝(Acknowledgement) viii
TABLE OF CONTENTS ix
LIST OF TABLES xii
LIST OF FIGURES xiii
LIST OF NOTATIONS xv

Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Related Works 2
1.3 Outline of This Dissertation 6
Chapter 2 Basics of Speech Recognition 7
2.1 Statistical Speech Recognition 7
2.1.1 Bayesian Theory 8
2.1.2 Preprocessing of Speech, Speech Units and Lexicons 9
2.2 Hidden Markov Models in Speech Recognition 10
2.3 Large Vocabulary Continuous Speech Recognition 10
2.3.1 Tree Organization of Lexicons 11
2.3.2 One Pass Search Algorithm 12
2.3.3 Tree Copy Search 14
2.3.4 Look-ahead and Pruning 16
2.4 Speaker Adaptation 18
Chapter 3 Bayesian Learning of Speech Duration Models 21
3.1 Parametric Duration Modeling 22
3.1.1 ML Parameter Estimation 23
3.1.2 ML Estimation for Different Duration Parameters 24
3.2 Bayesian Learning of Duration Models 26
3.2.1 MAP and QB Parameter Estimation 26
3.2.2 MAP Estimation for Gamma Duration Parameters 28
3.2.3 QB Estimation for Gaussian Duration Parameters 29
3.2.4 QB Estimation for Poisson Duration Parameters 30
3.3 Experiments 31
3.3.1 Experimental Setup 31
3.3.2 Implementation Issues 33
3.3.3 Evaluation of Different ML Duration Models 34
3.3.4 Evaluation of MAP Batch Learning for Different Duration Models 37
3.3.5 Evaluation of QB Sequential Learning for Different Duration Models 38
3.3.6 Evaluation of Recognition and Adaptation Time Costs for Different Duration Models 42
3.4 Summary 43
Chapter 4 A New Eigenvoice Approach to Speaker Adaptation 45
4.1 Eigenvoice 45
4.2 Maximum a Posteriori Eigen-decomposition 47
4.3 Eigenvoice-based Covariance Adaptation 49
4.4 Experiments 51
4.5 Summary 53
Chapter 5 Aggregate a Posteriori Linear Regression 55
5.1 Review of Discriminative Training and Linear Regression Algorithm 56
5.1.1 MCE and MMI Discriminative Training 56
5.1.2 MLLR and MAPLR Adaptation 58
5.1.3 MCELR, CMLLR and MPELR Adaptation 59
5.2 Aggregate a Posteriori Linear Regression Adaptation 61
5.2.1 AAP Probability 62
5.2.2 AAPLR Criterion 63
5.2.3 Relation between AAPLR and MAPLR Criteria 64
5.2.4 Derivation of AAPLR Solution 65
5.2.5 Comparison of Different Linear Regression Adaptation 67
5.3 Experiments 69
5.3.1 Speech Database and Experimental Setup 69
5.3.2 Implementation Issues 71
5.3.3 Linear Regression Adaptation Versus Recognition Performance and Adaptation Time 72
5.3.4 Evaluation of MLLR, MAPLR, MCELR, CMLLR and AAPLR for Unsupervised Adaptation 75
5.3.5 Evaluation of Speech Recognition Performance for Multiple Speaker Adaptation 77
5.4 Summary 78
Chapter 6 Conclusion 81
Bibliography 83
作者簡歷(Author’s Biographical Notes) 90
Abromovitz, M., Stegun, I. A., 1965. Handbook of Mathematical Functions. New York: Dover Publications, Inc.
Anastasakos, A., Schwartz, R., Shu, H., 1995. Duration modeling in large vocabulary speech recognition. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 628-631.
Bahl, L., Brown, P., de Souza, P., Mercer, R., 1986. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 11, pp. 49-52.
Burshtein, D., 1996. Robust parametric modeling of durations in hidden Markov models. IEEE Transactions on Speech and Audio Processing, vol. 4, no. 3, pp. 240-242.
Chen, K. T., Liau, W. W., Wang, H. M., Lee, L. S., 2000. Fast speaker adaptation using eigenspace-based maximum likelihood linear regression. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 742-745.
Chengalvarayan, R., 1998. Speaker adaptation using discriminative linear regression on time-varying mean parameters in trended HMM. IEEE Signal Processing Letters, vol. 5, pp. 63-65.
Chesta, C., Siohan, O., Lee, C.-H., 1999. Maximum a posteriori linear regression for hidden Markov model adaptation. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 1, pp. 211-214.
Chien, J.-T., 1999. Online hierarchical transformation of hidden Markov models for speech recognition. IEEE Transactions on Speech and Audio Processing, vol. 7, no. 6, pp. 656-667.
Chien, J.-T., 2002. Quasi-Bayes linear regression for sequential learning of hidden Markov models. IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, pp. 268-278.
Chien, J.-T., 2003. Linear regression based Bayesian predictive classification for speech recognition. IEEE Transactions on Speech and Audio Processing, vol. 11, no. 1, pp. 70-79.
Chien, J.-T., Huang, C,-H., 2003. Bayesian learning of speech duration models. IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 558-567.
Chien, J.-T., Liao, G.-H., 2001. Transformation-based Bayesian predictive classification using online prior evolution. IEEE Transactions on Speech and Audio Processing, vol. 9, no. 4, pp. 399-410.
Chou, W., 1999. Maximum a posteriori linear regression with elliptically symmetric matrix variate priors. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 1, pp. 1-4.
Chou, W., Juang, B.-H., 2003. Pattern Recognition in Speech and Language Processing, CRC Press.
Chow, Y.-L., 1990. Maximum mutual information estimation of HMM parameters for continuous speech recognition using the N-best algorithm. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 701-704.
DeGroot, M. H., 1970. Optimal Statistical Decisions, McGraw-Hill.
Dempster, P., Laird, N. M., Rubin, D. B., 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society (B), vol. 39, pp. 1-38.
Dong, R., Zhu, J., 2002. One use of duration modeling for continuous digits speech recognition. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 385-388.
Duda, R. O., Hart, P. E., Stork, D. G., 2001. Pattern Classification, John Wiley & Sons, Inc.
Fabian, T., Pfau, T., Ruske, G., 2001. Analysis of N-best output hypotheses for fast speech in large vocabulary continuous speech recognition. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 4, pp. 2535-2538.
Faltlhauser, R., Ruske, G., Thomae, M., 2002. Towards the question: why has speaking rate such an impact on speech recognition performance. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 2429-2432.
Ferguson, J. D., 1980. Variable duration models for speech. Proceedings of Symposium on the Application of Hidden Markov Models to Text and Speech, pp. 143-179.
Gao, S., Lee, C.-H., 2003. A discriminative decision tree learning approach to acoustic modeling. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1833-1836.
Gauvain, J. L., Lee, C.-H., 1994. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, vol. 2, pp. 291-298.
Gillick, L., Cox, S. J., 1989. Some statistical issues in the comparison of speech recognition algorithms. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 532-535.
Gopalakrishnan, P. S., Kanevsk, D., Nádas, A., Nahamoo, D., 1991. An inequality for rational function with applications to some statistical estimation problem. IEEE Transactions on. Information Theory, vol. 37, pp. 107-113.
Gu, H., Tseng, C., Lee, L., 1991. Isolated-utterance speech recognition using hidden Markov models with bounded state durations. IEEE Transactions on Signal Processing, vol. 39, no. 8, pp. 1743-1752.
Gunawardana, A., Byrne, W., 2001. Discriminative speaker adaptation with conditional maximum likelihood linear regression. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1203-1206.
He, X., Chou, W., 2003. Minimum classification error linear regression for acoustic model adaptation of continuous density HMMs. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 556-559.
Huang, C.-H., Chien, J-T., 2005. Aggregate a posteriori linear regression for speaker adaptation. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 973-976.
Hung, W.-W., Wang, H.-C., 1997. HMM retraining based on state duration alignment for noisy speech recognition. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 3, pp. 1519-1522.
Huo, Q., Lee, C.-H., 1997. On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate. IEEE Transactions on Speech and Audio Processing, vol. 5, no. 2, pp. 161-172, March.
Jeanrenaud, P., Eide, E., Chaudhari, U., McDonough, J., Ng, K., Siu, M., Gish, H., 1995. Reducing word error rate on conversational speech from the Switchboard corpus. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 53-56.
Juang, B.-H., Chou, W., Lee, C.-H., 1997. Minimum classification error rate methods for speech recognition. IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 257-265.
Kapadia, S., Valtchev, V., Young, S. J., 1993. MMI training for continuous phoneme recognition on the TIMIT database. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 2, pp. 491-494.
Kuhn, R., Junqua, J.-C., Nguyen, P. and Niedzielski, N., 2000. Rapid speaker adaptation in eigenvoice space. IEEE Transactions on Speech and Audio Processing, vol. 8, pp. 695-707.
Kuo, H.-K. J., Fosle-Lussier, E., Jiang, H., Lee, C.-H., 2000. Discriminative training of language models for speech recognition. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 325-328.
Kuwabara, H., 1997. Acoustic and perceptual properties of phonemes in continuous speech as a function of speaking rate. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1003-1006.
Lai, W.-H., Chen, S.-H., 2002. Analysis of syllable duration models for Mandarin speech. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 497-500.
Leggetter, C. J., Woodland, P. C., 1995. Maximum likelihood regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, vol. 9, pp. 171-185.
Levinson, S. E., 1986. Continuously variable duration hidden Markov models for automatic speech recognition. Computer Speech and Language, vol. 1, pp. 29-45.
Li, Q., Juang, B.-H., 2002. A new algorithm for fast discriminative training. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 97-100.
Li, Q., Juang, B.-H., 2003. Fast discriminative training for sequential observations with application to speaker identification. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 2, pp. 397-400.
Liu, C.-S., Lee, C.-H., Chou, W., Juang, B.-H., Rosenberg, A., 1995. A study on minimum error discriminative training for speaker recognition. The Journal of the Acoustic Society of America, vol. 97, no. 1. pp. 637-648.
Ljolje, A., 1994. The importance of cepstral parameter correlations in speech recognition. Computer Speech Language, vol. 8, pp. 223-232.
Mirghafori, N., Fosler, E., Morgan, N., 1996. Towards robustness to fast speech in ASR. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 335-338.
Morgan, N., Fosler, E., Mirghafori, N., 1997. Speech recognition using on-line estimation of speaking rate. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 2079-2082.
Nádas, A., 1983. A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 31, no. 4, pp. 814-817.
Nguyen, P., et al., 1999. Maximum likelihood eigenspace and MLLR for speech recognition in noisy environments. in Proc. Eurospeech, pp. 2519-2522.
Normandin, Y., Cardin, R., De Mori, R., 1994. High-performance connected digit recognition using maximum mutual information estimation. IEEE Transactions on Speech and Audio Processing, vol. 2, pp. 299-311.
Povey, D., Woodland, P. C., 2002. Minimum phone error and I-smoothing for improved discriminative training. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 105-108.
Power, K., 1996. Duration modeling for improved connected digit recognition. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 885-888.
Rabiner, L. R., Juang, B.-H., 1993. Fundamentals of Speech Recognition, Prentice-Hall, Inc.
Rabiner, L. R., Juang, B.-H., Levinson, S. E., Sondhi, M. M., 1985. Recognition of isolated digits using hidden Markov models with continuous mixture densities. AT&T Technical Journal, vol. 64, no. 6, pp. 1211-1234.
Rahim, M. G., Lee, C.-H., Juang, B.-H., 1997. Discriminative utterance verification for connected digit recognition. IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 266-277.
Reichl, W., Ruske, G., 1995. Discriminative training for continuous speech recognition. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 1, pp. 537-540.
Richardson, M., Hwang, M., Acero, A., Huang, X. D., 1999. Improvement on speech recognition for fast talkers. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 1, pp. 411-414.
Russell, M. J., Moore, R. K., 1985. Explicit modeling of state occupancy in hidden Markov models for automatic speech recognition. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 5-8.
Saul, L., Rahim, M., 1999. Modeling the rate of speech by Markov processes on curves. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 1, pp. 415-418.
Siegler, M. A., Stern, R. M., 1995. On the effects of speech rate in large vocabulary speech recognition systems. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 612-615.
Suaudeau, N., Andre-Obrecht, R., 1994. An efficient combination of acoustic and supra-segmental informations in a speech recognition system. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 65-68.
Summerfield, Q., 1981. On articulatory rate and perceptual constancy in phonetic perception. Journal of Experimental Psychology and Human Performance, vol. 7, pp. 1074-1095.
Tuerk, A., Young, S., 1999. Modelling speaking rate using a between frame distance metric. Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), vol. 1, pp. 419-422.
Turk, M. A., Pentland, A. P., 1991. Face Recognition Using Eigenfaces. Proceedings of IEEE Conference of Computer Vision and Pattern Recognition, pp. 586-591.
Verhasselt, J. P., Martens, J.-P., 1996. A fast and reliable rate of speech detector. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 2258-2261.
Wang, H.-M., 2003. MATBN 2002: A Mandarin Chinese broadcast news corpus. in Proc. ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo.
Wang, X., Pols, L. C. W., ten Bosch, L. F. M., 1996. Analysis of context-dependent segmental duration for automatic speech recognition. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 1181-1184.
Wang, L., Woodland, P. C., 2004. MPE-based discriminative linear transform for speaker adaptation. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 312-324.
Ward, N., Nakagawa, S., 2002. Automatic user-adaptive speaking rate selection for information delivery. Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 549-552.
Woodland, P. C., 1999. Speaker adaptation: Techniques and challenges. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pp.85-90.
Woodland, P. C., Povey, D., 2000. Large scale discriminative training for speech recognition. Proc. ITRW ASR, ISCA.
Wu, J., Huo, Q., 2002. Supervised adaptation of MCE-trained CDHMMs using minimum classification error linear regression. IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 605-608.
Yoma, N. B., McInnes, F. R., Jack, M. A., Stump, S. D., Ling, L. L., 2001. On including temporal constraints in Viterbi alignment for speech recognition in noise. IEEE Transactions on Speech and Audio Processing, vol. 9, no. 2, pp. 179-182.
Yoma, N. B., Sanchez, J. S., 2002. MAP speaker adaptation of state duration distributions for speech recognition. IEEE Transactions on Speech and Audio Processing, vol. 10, no. 7, pp. 443-450.
Zilca, R. D., 2002. Text-independent speaker verification using utterance level scoring and covariance modeling. IEEE Transactions on Speech and Audio Processing, vol.10, no.6, pp.363-370.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top