|
In this thesis, a text-to-speech system is designed and implemented on MS-Windows operating system. The 408 first-tone Mandarin syllables are adopted as the synthesis units. For the synthesis of syllable-signal, a time-domain processing method called "Time Proportionated Interpolation of Pitch Waveform (TPIPW)" is proposed. About the prosodic processing unit, a rule-based method proposed by other researchers is adopted and slightly modified here. In our method, the two parts of a syllable, i.e. the unvoiced part (e.g. voiceless consonants) and voiced parts (e.g. voiced consonants and vowels), are processed separately. The name of our method is just selected to reflect the voiced-part''s processing. By using this method, a syllable''s tone(or pitch-contour), duration, and formant- frequency height can be almost independently controlled. Especially, the duration of a syllable can be more freely changed to a value between one half and double of the original length without notable side-effects on the other two control factors. Besides, the function of increasing or decreasing formant-frequency values is provided to simulate the adjusting of vocal-track length such that the original recorded male voice can be more naturally converted to a female''s voice. For the unvoiced part, signal waveforms are classified into two classes and a method is proposed to process each class differently. This method not only synthesizes clear and intelligible signals but also support the control of duration.
|