|
The design and implementation of an RNN-based speech recognizer forlarge-vocabulary isolated Mandarin word on a Pentium PC wih SoundBlaster add-on card and Windows95 environment is present in this thesis. It can be functionally divided into two parts: pre- processingand word recognition. In pre-processing, a small RNN is first used todiscriminate input speech from the background silence. Driven by theoutput of th RNN, a finite state machine is then used to determine all word boundaries. State-dependent constraints are then added to eliminate some computations of feature extraction. This can relievethe load of CPU. After entering the state of the end of utternce, word recognition using an RNN base-syllable recognizer and an RNN tone recognizer is then performed to determine the best N candidatesof word. It is noted that the pre-processing is run in real-time.So, an average waiting time of 0.876 seconds for word recognitioncan be achieved. Recognition rates of 20.6%, 51.6%, 88.4%, 95.2% and100% are obtained for one- to five-syllabic words, respectively.
|