|
In this thesis, a CELP-based text-to-speech conversion system is presented. We take 1410 Mandarin Chinese monosyllables as the basic synthetic units in this system. The Code Excited Linear Prediction (CELP) algorithm is applied to our speech synthesizer for high compression rate and good speech quality. In order to improve the naturalness of the synthetic speech, a method for prosodic modification is proposed to replace the traditional rule-based approach for pronunciation. At first, a total of 12 representative pitch contour patterns are defined for the behavior of four lexical tones and a neutral tone in Mandarin Chinese. By the observation, it appears that the acoustic properties of a syllable may be affected by the different concatenation condition in a sentence. Consequently, a Bayesian network is employed to model the relation between fluctuation of pitch contour and linguistic features. This network is trained by a set of sentence utterance and provides appropriate prosodic information for adjusting the synthetic speech in the synthesis process. The synthetic speech has been tested on 20 subjects. The results indicated that the average correct rate is 96.65% for intelligibility, and the ratio for the mean opinion score above "fair" level is 84.31% for naturalness.
|