(54.236.58.220) 您好!臺灣時間:2021/03/01 19:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:廖元甫
研究生(外文):Yuan-Fu Liao
論文名稱:以模組化遞迴類神經網路為基礎之中文語音辨認
論文名稱(外文):Modular Recurrent Neural Networks-based Mandarin Speech Recognition
指導教授:陳信宏陳信宏引用關係
指導教授(外文):Sin-Horn Chen
學位類別:博士
校院名稱:國立交通大學
系所名稱:電信工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:1998
畢業學年度:87
語文別:中文
論文頁數:96
中文關鍵詞:中文語音辨認模組化遞迴類神經網路
外文關鍵詞:Mandarin speech recognitionmodular recurrent neural network
相關次數:
  • 被引用被引用:3
  • 點閱點閱:163
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文探討使用遞迴類神經網路之中文語音辨認方法,首先提出一種以遞迴類神經網路
為基礎之預分類器,用以將輸入語音先行區分成三種穩態包括︰聲母、韻母、靜音,以
及一個暫態;而後在辨認時,縮小穩態部份的搜尋範圍,以加速隱藏式馬可夫模型辨認
器的速度,由單一語者連續語音辨認實驗的結果證明,它可配合光束搜尋法使用,在只
降低0.1%辨認率的情況下,將傳統光束搜尋法所需計算的高斯機率函數個數,與所需搜
尋的隱藏式馬可夫模型狀態數,分別再刪掉38.7%與35.1%。其次提出一種以模組化遞
迴類神經網路為基礎之國語單字音辨認器,將複雜的國語音節辨認工作切割成五個
子工作,包括︰聲母辨認、韻母辨認、音調辨認、語音大分類加權/切割、及聲母分
類加權,各子工作由一個專用的遞迴類神經網路完成,再將它們的輸出直接結合成音
節辨認分數,完成音節辨認工作,最後並擴充此辨認器同時進行正反時間方向的辨認,
由多語者單音節辨認的實驗結果顯示,它較以最小錯誤率法則訓練之先進的隱藏式馬可夫模型
辨認法為優,基本音節之辨認率可由76.8%提升至82.8%,而音節之辨認率則可由
70.8%提升至76.3%。最後擴充此模組化遞迴類神經網路國語單字音辨認器,加入一個
音節邊界偵測模組及一個多層次刪減辨認搜尋演算法,以進行連續國語基本音節辨認,
由單一語者連續語音辨認的實驗結果顯示,它在系統複雜度與辨認率上,皆優於已經使
用以最小錯誤率法則訓練之先進的隱藏式馬可夫模型辨認法,基本音節辨認率可由以最
大相似度法則訓練之HMM的80.9%,與以最小錯誤率法則訓練之HMM的84.3%,提升至85.8%,
並可用多層次刪減辨認搜尋演算法,將所需搜尋的音節狀態數與所需考慮的音節狀態轉移數,
分別降至53.5%與25.3%。因此本論文所提出的方法,非常適合使用在中文語音辨認上。
In this dissertation, three recurrent neural network
(RNN)-based speech recognition schemes are proposed. One is an RNN-based
pre-classifier for improving the recognition speed of the HMM method. It first
pre-classifies the input speech into three stable states, including it
initial, final, and silence, and a transient state. It then set more
restrict constraints in the recognition search for frames with these three
stable states to prune some unlikely paths. Experimental results confirmed that
it can be used in conjunction with the beam search algorithm. The computational
cost of the beam search algorithm is further improved by dropping away
additional 38.7% of searching states and by eliminating the likelihood
calculations for additional 35.1% of Gaussian components with a paid of a
degradation of 0.1% on the recognition rate. This confirms the efficiency of
the proposed fast recognition method. Another is a modular RNN (MRNN)-based
method for isolated Mandarin syllable recognition. It first employs the "divide-and-conquer" principle to divide the complicated task of recognizing
1280 syllables into five subtasks including three discrimination subtasks,
respectively, for 100 initials, 39 finals, and 5 tones, and two
broad-class classification subtasks, respectively, for three speech
broad-classes of initial, final, and silence, and 9 initial
sub-classes. It then uses five RNNs to attack each subtask separately. Outputs
of these five RNNs are directly combined to form the discriminant functions for
all 1280 syllables. The recognizer is further extended to include two MRNNs for
both forward-time and backward-time. Experimental results in a multi-speaker
syllable recognition task confirmed that it outperformed the MCE/GPD-trained
HMM method on both the recognition complexity and accuracy. The base-syllable
and syllable recognition rates of the MCE/GPD-trained HMM were further improved
from 76.8% and 70.8% to 82.8% and 76.3% by the MRNN system. The other is an
MRNN-based method for continuous Mandarin base-syllable recognition. It extends
the previous MRNN method for isolated Mandain syllable recognition to
additionally include a syllable boundary detection module and a multi-level
pruning recognition search algorithm. Experimental results in a
speaker-dependent speech recognition task showed that the proposed method also
outperformed the MCE/GPD-trained HMM method. The base-syllable recognition
rates of the ML-trained and MCE/GPD-trained HMM were further improved from
80.9% and 84.3% to 85.8% by the MRNN system. In addition, only 53.5% of the
surviving base-syllable states and 25.3% of the surviving base-syllable
transitions were needed to be considered in the multi-level pruning search with
no cost for the degradation of recognition accuracy. From above discussions, we
can therefore conclude that the RNN-based speech recognition approach is very
promising for both isolated and continuous Mandarin speech.
Cover
中文摘要
Abstract
誌謝
Contents
List of Tables
List of Figures
Chapter 1 Introduction
1.1 Background Information about Mandarin Speech Recognition
1.2 Organization of the Dissertation
Chapter 2 The RNN-Based Pre-Classification Scheme for Fast Continuous Mandarin Speech Recognition
2.1 The Proposed Pre-Classification Scheme
2.2 Integrating the FSM with the DP Search
2.3 Experimental Results
2.4 Discussions and Summary
Chapter 3 The MRNN-Based Isolated Mandarin Syllable Recogniton Scheme
3.1 The Preliminary Study
3.2 The Proposed MRNN-Based Scheme
3.3 Evaluations
3.4 Summary
Chapter 4 The MRNN-Based Continuous Mandarin Base-syllable Recognition Scheme
4.1 The Proposed Method
4.2 The Bottom-Up Hierarchical Training Scheme
4.3 Simulations
4.4 Conclusions
Chapter 5 Conclusions
Appendix A The syllable-level MCE/GPD algorithm for MRNN training
Bibliography
Vita
Publication List of Yuan-Fu Liao
[1] L. R. Rabiner. "A tutorial on hidden Markov models and selected applications in speech
recognition". Proc. IEEE, 77(2) :257-286, 1989.
[2] M. Ostendrof. "From HMMs to segninet models: Stochastic modeling for CSR". In Chin-
Hui Lee, Frank K. Soong, and Kuldip K. Paliwal, editors, Automatic Speech And Speaker
Recognition - Advanced Topics, pages 185-210. Kluwer Academic Publishers, 1996.
[3] L. Deng and M. Aksmanovic. "Speaker-independent phonetic classification using hidden
Markov models with mixtures of trend functions". IEEE Trans. on Speech and Audio
Processing, 5(4):319-324, July 1997.
[4] Y. Normandin. "Maximum mutual information estimation of hidden Markov models". In
Chin-Hui Lee, Frank K. Soong, and Kuldip K. Paliwal, editors, Automatic Speech And
Speaker Recognition - Advanced Topics, pages 57-82. Kluwer Academic Publishers, 1996.
[5] C. H. Lee and J. L. Gauvain. "Bayesian adaptive learning and MAP estimation ofHMM".
In Chin-Hui Lee, Frank K. Soong, and Kuldip K. Paliwal, editors, Automatic Speech And
Speaker Recognition - Advanced Topics, pages 83-108. Kluwer Academic Publishers, 1996.
[6] S. Katagiri, C. H. Lee, and B. H. Juang. "New discriminative training algorithms based on
the generalized probabilistic descent method". In Proc. IEEE Neural Networks for Signal
Process. (NNSP), pages 299-308, 1991.
[7] B. H. Juang, W. Chou, and C. H. Lee. "Statistical and discriminative methods for speech
recognition". In Chin-Hui Lee, Frank K. Soong, and Kuldip K. Paliwal, editors, Automatic
Speech And Speaker Recognition - Advanced Topics, pages 109-132. Kluwer Academic Pub-
lishers, 1996.
[8] H. Bourlard and N. Morgan. Connectionist speech recognition - A hybrid approach. Kluwer
Academic. Publishers, 1994.
[9] A. J. Robinson. "An application of recurrent nets to phone probability estimation". IEEE
Trans. on Neural Networks, 5(2):298-305, March 1994.
[10] D. Kershaw, T. Robinson, and S. Renals. "The 1995 ABBOT LVCSR system for multiple
unknown microphones". In Int. Conf. in Spoken Language Processing (ICSLP), 1996.
[11] M. Franzini, K. Lee, and A. Waibel. "Connectionist viterbi training: a new hybrid method
for continuous speech recognition". In Proc. IEEE Intern. Conf. Acoust., Speech, Signal
Process. (ICASSP), volume I, pages 425-428, 1990.
[12] S. Renals and et al. N. Morgan. "Connectionist probability estimators in HMM speech
recognition". IEEE Trans. on Speech and Audio Processing, 2(1 Pt. 2):161-174, Jan. 1994.
[13] R. A. Jacobs, M. 1. Jordan, and A. G. Barto. "Task decomposition through competition
in a modular connectionist architecture: the what and where vision tasks". Cognitive
Science, 15:219-250,1991.
[14] Bart L. M. Happel and Jacob M. J. Murre. "Design and evolution of modular neural
network architectures". Neural Networks, 7(6/7):985-1004, 1994.
[15] Y. Bengio, R. D. Mori, G. Flammia, and R. Kompe. "Global optimization of a neural
network-hidden Markov model hybrid". IEEE Trans. on Neural Networks, 3(2):252-259,
March 1992.
[16] Y. Bengio. Neural networks for speech and sequence recognition. International Thomson
Computer Press, 1996.
[17] X. Tu, Y. Yan, and R. Cole. "Matching training and testing criteria in hybrid speech
recognition systems". In European Conference on Speech Communication and Technology
(Eurospeech), volume 13, pages 1943-1946, 1997.
[18] M. Schuster. "Incorporation of HMM output constraints in hybrid NN/HMM systems
during training". In European Conference on Speech Communication and Technology (Eu-
rospeech), pages 2843-2846,1997.
[19] D. Kershaw, T. Robinson, and M. Hochberg. "Context-dependent classes in a hybrid
recurrent network-HMM speech recognition system". In Advances in Neural Information
Processing Systems (NIPS), 8, pages 750-756. 1996.
[20] A. Senior and T. Robinson. "Forward-backward retraining of recurrent neural networks".
In Advances in Neural Information Processing Systems (NIPS), 8, pages 743-749. 1996.
[21] B. H. Juang, W. Chou, and C. H. Lee. "Minimum classification error rate methods for
speech recognition". IEEE Trans. on Speech and Audio Processing, 5(3):257-265, May
1997.
[22] C. H. Lee and B. H. Juang. "A survey on automatic, speech recognition with an illustrative
example on continuous speech recognition of Mandarin". Computational Linguistics and
Chinese Language Processing, l(l):01-36, Aug. 1996.
[23] J.K. Chen and Frank K. Soong. "An N-best candidates-based discriminative training for
speech recognition application". IEEE Trans. on Speech and Audio Processing, 2(l):206-
216, Jan. 1994.
[24] Robert E. Jenkins and Ben P. Yuhas. "A simplified neural network solution through
problem decomposition: The case of the truck backer-upper". IEEE Trans. on Neural
Networks, 4(4):718-720,1993.
[25] M. 1. Jordan and R. A. Jacobs. "Hierarchical mixtures of experts and the EM algorithm".
Neural Computation, 6:181-214,1994.
[26] Y. Zhao, R. Schwartz, J. Sroka, and J. Makhou. "Hierarchical mixtures of experts method-
ology applied to continuous speech recognition". In Advances in Neural Information Pro-
cessing Systems (NIPS), 7, pages 859-865. 1995.
[27] J. Fritsch, M. Finke, and A. Waibel. "Context-dependent hybrid HME/HMM speech re-
cognition using polyphone clustering decision trees". In Proc. IEEE Intern. Conf. Acoust.,
Speech, Signal Process. (ICASSP), volume 3, pages 1759-1762, 1997.
[28] A. Waibel, H. Sawai, and K. Shikano. "Modularity and scaling in large phonemic, neural
networks". IEEE Trans. Acoust., Speech and Signal Process, 37(12):1888-1898, 1989.
[29] J.B. Hampshire II and A. Waibel. "The Meta-Pi network: building distributed knowledge
representations for robust multisource pattern recognition". IEEE Trans. Pattern Analysis
and Machine Intelligence, 14:751-769, 1992.
[30] 1. C. Jou, M. S. Hu, and Y. T. Juang. "Mandarin syllables recognition based on one class
one net neural network with modified selective update algorithm". In Workshop Notes.
1992 IEEE International Workshop on Intelligent Signal Processing and Communication
Systems, pages 577-91,1992.
[31] 1. C. Jou, M. S. Hu, and Y. T. Juang. "A neural network based speech recognition system
for isolated cantonese syllables". In Proc. IEEE Intern. Conf. Acoust., Speech, Signal
Process. (ICASSP), volume 4, pages 3269-3272, 197.
[32] J. F. Wang, C. H. Wu, S. H. Chang, and J. Y. Lee. "A hierarchical neural network model
based on a C/V segmentation algorithm for isolated Mandarin speech recognition". IEEE
Trans. on Signal Processing, 39(9):2141-2145, Sep. 1992.
[33] S. Renals and M. Hochberg. "Efficient search using phone probability estimates". In Proc.
IEEE Intern. Conf. Acoust., Speech, Signal Process. (ICASSP), pages 596-599, 1995.
[34] A. Hunt. "Recurrent neural network for syllabification". Speech Communication, 13:323-
332,1993.
[35] Y. F. Liao, W. Y. Chen, and S. H. Chen. "Continuous Mandarin speech recognition using
hierarchical recurrent neural network". In Proc. IEEE Intern. Conf. Acoust., Speech, Signal
Process. (ICASSP), volume 6, pages 3371-3374, 1996.
[36] L. S. Lee, C. Y. Tseng, and et al. "Golden Mandarin (1) - a real-time Mandarin speech
dictation machine for Chinese language with very large vocabulary". IEEE Trarts. on Speech
and Audio Processing, l(2):158-179, 1993.
[37] L. S. Lee, C. Y. Tseng, and et al. "Golden Mandarin (II) - an improved single-chip
real-time Mandarin dictation machine for Chinese language with very large vocabulary".
In Proc. IEEE Intern. Con}. Acoust., Speech, Signal Process. (ICASSP), volume 2, pages
503-506,1993.
[38] et al. R. Y. Lyu. "Golden Mandarin (III) - a user-adaptive prosodic-segment-based Man-
darin dictation machine for Chinese language with very large vocabulary". In Proc. IEEE
Intern. Con}. Acoust., Speech, Signal Process. (ICASSP), volume I, pages 57-60, 1995.
[39] H. M. Wang and et al. L. S Lee. "Complete recognition of continuous Mandarin speech
for Chinese language with very large vocabulary but limited training data". In Proc. IEEE
Intern. Con}. Acoust., Speech, Signal Process. (ICASSP), volume I, pages 61-64, 1995.
[40] Lin-ShanLee. "Voice dictation of Mandarin Chinese". IEEE Signal Processing Magazine,
pages pp. 17-34,1994.
[41] P. C. Chang and B. H. Juang. "Discriminative training of dynamic programming based
speech recognizers". IEEE Trans. Speech and Audio Processing, l(2):135-143, 1993.
[42] P. C. Chang, S. H. Chen, and B. H. Juang. "Discriminative analysis of distortion sequences
in speech recognition". IEEE Trans. Speech and Audio Processing, l(3):326-333, 1993.
[43] Saga Chang and Sin-Horng Chen. "Isolated Mandarin syllable recognition using segmental
features". IEE Proc.-Vis. Image Signal Process, 142(l):59-64, 1995.
[44] Saga Chang and Sin-Horng Chen. "A modified hidden seni-Markov model for multi-speaker
Mandarin syllable recognition". Journal of the Chinese Institute of Electrical Engineering,
l(2):95-104,1994.
[45] E.F. Huang, H.C. Wang, and F. K. Soong. "A fast algorithm for large vocabulary keyword
spotting application". IEEE Trans. on Speech and Audio Processing, 2(3):449-452, July
1994.
[46] E.F. Huang and H.C. Wang. "An efficient algorithm for syllable hypothesization in con-
tinuous Mandarin speech recognition". IEEE Trarts. on Speech and Audio Processing,
2(3):446-448, July 1994.
[47] C. C. Huand and J. F. Wang et. al. "A Mandarin speech dictation system based on neural
network and language processing model". IEEE Trans. on Consumer Electronics, 40(3),
Aug. 1994.
[48] P. C. Chang, S. W. Sue, and S. H. Chen. "Mandarin tone recognition by multi-layer
perceptron". In Proc. IEEE Intern. Con}. Acoust., Speech, Signal Process. (ICASSP),
pages 517-520,1990.
[49] Y. R. Wang, J. M. Shieh, and S. H. Chen. "Tone recognition of continuous Mandarin
speech based on hidden Markov model". Int. J. Pattern Recog. Artific. Intell., 8:233-246,
1994.
[50] Y. R. Wang and S. H. Chen. "Tone recognition of continuous Mandarin speech assisted
with prosodic information". J. Accoust. Soc. Am., 96(5 Pt. l):2637-2645, Nov. 1994.
[51] Y. R. Wang and S. H. Chen. "Tone recognition of continuous Mandarin speech based on
neural network". IEEE Trans. on Speech and Audio Processing, 3(2): 146-150, Mar. 1995.
[52] H. Ney, D. Mergel, A. Noll, and A. Paeseler. "A data-driven organization of the dynamic
programming beam search for continuous speech recognition". In Proc. IEEE Intern. Con}.
Acoust., Speech, Signal Process. (ICASSP), pages 833-836, 1987.
[53] P. S. Gopalakrishnan, L. R. Bahl, and R. L. Mercer. "A tree search strategy for large-
vocabulary continuous speech recognition". In Proc. IEEE Intern. Con}. Acoust. Speech,
Signal Process. (ICASSP), pages 572-575, 1995.
[54] R. Haeb-Umbach and H. Ney. "Improvements in beam search for 10000-word continuous-
speech recognition". IEEE Trans. on Speech and Audio Processing, 2(2):353-356, 1994.
[55] B. Atal and L. Rabiner. "A pattern recognition approach to voiced-unvoiced-silence classi-
fication with applications to speech recognition". IEEE Trans. on Acoust., Speech, Signal
Processing, 24:201-212, Jun. 1976.
[56] Y. Qi and B. R. Hunt. "Voiced-unvoiced-silence classification of speech using hybrid
feature and a network classifier". IEEE Trarts. on Speech and Audio Processing, l(2):250-
255,1993.
[57] B. Ma and et al. T. Huang. "Context-dependent acoustic models for Chinese speech recog-
nition". In Proc. IEEE Intern. Con}. Acoust., Speech, Signal Process. (ICASSP), volume I,
pages 455-458,1996.
[58] J. L. Einian. "Finding structure in time". Cognitive Science, 14:179-211, 1990.
[59] T. W. Cacciatore and S. J. Nowlan. "Mixtures of controllers for jump linear and nonlinear
plants". In Advances in Neural Information Processing Systems (NIPS), 6, pages 719-726.
1994.
[60] Y. Bengio and P. Frasconi. "An input output HMM architecture". In Advances in Neural
Information Processing Systems (NIPS), 7, pages 427-434. 1995.
[61] A. Kehagial and V. Petridis. "Predictive modular neural networks for time series classifi-
cation". Neural Networks, 10(l):31-49,1997.
[62] W. Y. Chen, Y. F. Liao, and S. H. Chen. "Speech recognition with hierarchical recurrent
neural networks". Pattern Recognition, 28(6) :795-805, 1995.
[63] S.J. Lee, K.C. Kirn, H. Yoon, and J.W. Cho. "Application of fully recurrent neural networks
for speech recognition". In Proc. IEEE Intern. Con}. Acoust., Speech, Signal Process.
(ICASSP), pages 77-80,1991.
[64] M. M. Hochberg, G. D. Cook, S. J. Renals, and A. J. Robinson. "Connectionist model
combination for large vocabulary speech recognition". In Proc. of Intel. Con}, on Spoken
Language Processing, pages 1499-1502, 1994.
[65] J.D. Markel and A.H. Gray Jr. Linear prediction of speech. Springer-Verlag Berlin Heidel-
berg, 1976.
[66] H. C. Wang; H. F. Pai. "Recognition of Mandarin syllables based on the distribution of
two-dimensional cepstral coefficients". International Journal of Pattern Recognition and
Artificial Intelligence, 8(l):247-57,1993.
[67] L. Rabiner and B. H. Juang. Fundamentals of speech recognition. Prentice Hall interna-
tional Inc., 1993.
[68] D.Burshtein. "Robust parametric modeling of durations in hidden Markov models". IEEE
Trans. on Speech and Audio Processing, 4(3):240-242, May 1996.
[69] H. Y. Gu, C. Y. Tseng, and L. S. Lee. "Isolated-utterance speech recognition using hid-
den Markov models with bounded state duration". IEEE Trans. on Signal Processing,
39(8):1743-1752, Aug. 1991.
[70] S. H. Chen, Y. F. Liao, S. M. Chiang, and S. Chang. "An RNN-based pre-classification
method for fast continuous Mandarin speech recognition". IEEE Trans. on Speech and
Audio Processing, 6(l):86-90, Jan. 1998.
[71] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. "Learning internal representations by
error propagation". In D. E. Rumelhart, J, and L. McClelland, editors, Parallel Distributed
Processing, volume I, pages 318-362. Cambridge, MA: MIT Press, Bradford Books, 1986.
[72] Yuan-Fu Liao, Wern-Jun Wang, Shu-Ling Lee, and Sin-Horng Chen. "A first study on
Mandarin prosodic state detection". In Proceedings of ROCLING X International Confer-
ence 1997 Research on Computational Linguistics, pages 399-411, Aug. 1997.
[73] Yuan-Fu Liao, Wern-Jun Wang, and Sin-Horng Chen. "A first study on prosodic modeling
of Mandarin speech for recognition". In Proc. Intern. Symp. on Multimedia Information
Processing (ISMIP), pages 94-101, Dec. 1998.
[74] Wei-Tyng Hong and Sin-Horng Chen. "An RNN-based noise estimation and likelihood
compensation for noisy speech recognition". In Proc. IEEE 1996 Workshop on Neural
Networks for Signal Processing, pages 293-301, 1996.
[75] Wei-Tyng Hong and Sin-Horng Chen. "A robust RNN-based pre-classification for noisy
Mandarin speech recognition". In European Conference on Speech Communication and
Technology (Eurospeech), volume 3, pages 1083-1086, 1997.
[76] P. R. Lu, W. T. Hong, S. L. Chiang, Y. R. Wang, and S. H. Chen. "A prototype of a
mandarin speech telephone number inquiry system". In Proc. of 1997 International Con}.
on Consumer Electronics, 1997.
[77] Wei-Tyng Hong, Yuan-Fu Liao, Yih-Ru Wang, and Sin-Horng Chen. "RNN-based noisy
speech segmentation for pine-based noisy Mandarin base-syllable recognition". Submit to
J. Accoust. Soc. Am.
[78] Wei-Tyng Hong and Sin-Horng Chen. "A robust training algorithm for adverse Mandarin
speech recognition". Submit to Speech Communication.
[79] T. Komori and S. Katagiri. "GPD training of dynamic programming-based speech recog-
nizers". J. Acoust. Soc. Jpn. (E), 13(6) :341-349, 1992.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔