研究生(外文):Yi-chin Huang
論文名稱(外文):Emotional Text-to-Speech System of Baseball Broadcast
指導教授(外文):Chia-ping Chen
外文關鍵詞:speech synthesisemotion conversionprosodic rule
In this study, we implement an emotional text-to-speech system for the limited domain of on-line play-by-play baseball game summary. TheChinese Professional Baseball League (CPBL) is our target domain. Our goal is that the output synthesized speech is fluent with appropriate emotion. The system first parses the input text and keeps the on-court informations, e.g., the number of runners and which base is occupied, the number of outs, the score of each team, the batter''s performance in game. And the system adds additional sentences in the input text.
Then, the system outputs neutral synthesized speech from the text with additional sentences inserted, and subsequently converts it to emotional speech. Our approach to speech conversion is to simulate a baseball braodcaster. Specifically, our system learns and uses the prosody from a broadcaster. To learn the prosody, we record two baseball games and analyze the prosodic features of emotional utterances.
These observations are used to generate some prosodic rules of emotional conversion. The subjective evaluation is used to study the preference of the subjects about the additional sentences insertion and the emotion conversion in the system.
1 Introduction ................................1
1.1 Background ................................1
1.2 Motivation .................................2
1.3 Thesis Organization ............................4
2 Review ................................6
2.1 Concatenation-Based TTS ........................6
2.2 Speech Emotion Conversion .......................8
3 Basic Text-to-Speech Module ................................10
3.1 Speech Inventory .............................10
3.2 Pre-Processing of the Synthesis Units ..................12
3.2.1 Pitch Tracking ..........................12
3.2.2 Energy Normalization ......................13
3.3 Basic TTS Framework ..........................13
3.4 Synthesizer ................................14
4 Emotional Speech Corpus and Analysis ................................18
4.1 Emotional Speech Corpus Construction .................18
4.2 Classification of Emotional Corpus ...................19
4.3 F0 Contour Analysis ...........................21
4.4 Stressed Syllables .............................23
5 Additional Sentence Generation Module ................................27
5.1 On-court Information Parser .......................28
5.2 Additional Sentence Insertion ......................30
6 Experiment and Evaluation 31
6.1 Speech Emotion Conversion Module ...................31
6.1.1 Text Analyzer ...........................31
6.1.2 F0 Extraction ...........................33
6.1.3 Rhythmic Stress .........................33
6.1.4 Semantic Stress ..........................34
6.1.5 Speech Synthesizer ........................36
6.2 Evaluation .................................36
6.2.1 Perceptual Experiment ......................36
6.2.2 Preference Test ..........................38
6.2.3 Additional Sentence Preference Test ...............38
6.3 Discussion .................................39
6.4 Cross-fading Effect ............................40
7 Conclusion and Future Work ................................42
7.1 Conclusion .................................42
7.2 Future Work ................................43
