跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.89) 您好!臺灣時間:2024/12/13 15:00
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃奕欽
研究生(外文):Yi-chin Huang
論文名稱:模擬棒球廣播之情緒化語音合成系統
論文名稱(外文):Emotional Text-to-Speech System of Baseball Broadcast
指導教授:陳嘉平陳嘉平引用關係
指導教授(外文):Chia-ping Chen
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:英文
論文頁數:62
中文關鍵詞:接合式語音合成韻律的調整語音合成
外文關鍵詞:speech synthesisemotion conversionprosodic rule
相關次數:
  • 被引用被引用:0
  • 點閱點閱:285
  • 評分評分:
  • 下載下載:41
  • 收藏至我的研究室書目清單書目收藏:0
本研究為建立一個具有情緒化的棒球報導語音合成系統。我們系統的目標為讓合成的語音能夠盡量模擬廣播中棒球主播的報導方式,包括主播報導時的具有情緒的語音,以及主播在轉播時額外加入的場上資訊,都是系統建制時必須考慮的部份。為了讓合成系統能以語音的方式提供使用者額外的場上資訊,我們對線上文本的內容作文句的剖析,並且根據剖析的內容不斷的更新場上的狀況,像是壘上的人數和哪些壘包有人、目前的出局數、比數,以及場上打者在前幾次的打擊中的表現。這些資訊會用來產生額外的文句並插入到原來文本中合適的位置中。加入額外文句後的文本先經由基本的接合式語音合成器來產生語音,而在韻律的調整上,是利用實際的兩場棒球報導播的語料庫來學習韻律規則,並利用這些規則來調整基本合成器所合成的語句,讓合成的語音在韻律上能夠有情緒的表現。最後,當系統建立完成,利用主觀的聽測實驗來瞭解聽者對於合成系統的結果是否滿意。
In this study, we implement an emotional text-to-speech system for the limited domain of on-line play-by-play baseball game summary. TheChinese Professional Baseball League (CPBL) is our target domain. Our goal is that the output synthesized speech is fluent with appropriate emotion. The system first parses the input text and keeps the on-court informations, e.g., the number of runners and which base is occupied, the number of outs, the score of each team, the batter''s performance in game. And the system adds additional sentences in the input text.
Then, the system outputs neutral synthesized speech from the text with additional sentences inserted, and subsequently converts it to emotional speech. Our approach to speech conversion is to simulate a baseball braodcaster. Specifically, our system learns and uses the prosody from a broadcaster. To learn the prosody, we record two baseball games and analyze the prosodic features of emotional utterances.
These observations are used to generate some prosodic rules of emotional conversion. The subjective evaluation is used to study the preference of the subjects about the additional sentences insertion and the emotion conversion in the system.
1 Introduction ................................1
1.1 Background ................................1
1.2 Motivation .................................2
1.3 Thesis Organization ............................4
2 Review ................................6
2.1 Concatenation-Based TTS ........................6
2.2 Speech Emotion Conversion .......................8
3 Basic Text-to-Speech Module ................................10
3.1 Speech Inventory .............................10
3.2 Pre-Processing of the Synthesis Units ..................12
3.2.1 Pitch Tracking ..........................12
3.2.2 Energy Normalization ......................13
3.3 Basic TTS Framework ..........................13
3.4 Synthesizer ................................14
4 Emotional Speech Corpus and Analysis ................................18
4.1 Emotional Speech Corpus Construction .................18
4.2 Classification of Emotional Corpus ...................19
4.3 F0 Contour Analysis ...........................21
4.4 Stressed Syllables .............................23
5 Additional Sentence Generation Module ................................27
5.1 On-court Information Parser .......................28
5.2 Additional Sentence Insertion ......................30
6 Experiment and Evaluation 31
6.1 Speech Emotion Conversion Module ...................31
6.1.1 Text Analyzer ...........................31
6.1.2 F0 Extraction ...........................33
6.1.3 Rhythmic Stress .........................33
6.1.4 Semantic Stress ..........................34
6.1.5 Speech Synthesizer ........................36
6.2 Evaluation .................................36
6.2.1 Perceptual Experiment ......................36
6.2.2 Preference Test ..........................38
6.2.3 Additional Sentence Preference Test ...............38
6.3 Discussion .................................39
6.4 Cross-fading Effect ............................40
7 Conclusion and Future Work ................................42
7.1 Conclusion .................................42
7.2 Future Work ................................43
[1] L.S. Lee, C.Y. Tseng, and M. Ouh-Young. The synthesis rules in a Chinese text-to-speech system. IEEE Transactions on Acoustics, Speech, and Signal
Processing, 37(9):1309–1320, 1989.
[2] M.S. Liang, R.C. Yang, Y.C. Chiang, D.C. Lyu, and R.Y. Lyu. A Taiwanese Text-to-Speech System with Applications to Language Learning. In Proceedings of the IEEE International Conference on Advanced Learning Technologies, volume 1, pages 91–95. IEEE Computer Society Washington, DC, USA, 2004.
[3] A.J. Hunt and A.W. Black. Unit selection in a concatenative speech synthesis system using alarge speech database. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996, volume 1, pages 373–376, 1996.
[4] 古鴻炎 and 楊仲捷 基於VQ/HMM之國語語句基週軌跡產生之方法. Master’s thesis, 國立台灣科技大學電機所, 1999.
[5] A.W. Black and K.A. Lenzo. Limited Domain Synthesis. In Proceedings of the Sixth International Conference on Spoken Language Processing, 2000. ISCA, 2000.
[6] S.J. Kim, J.J. Kim, and M. Hahn. HMM-based Korean speech synthesis system for hand-held devices. IEEE Transactions on Consumer Electronics, 52(4): 1384–1390, 2006.
[7] S.H. Chen, S.H. Hwang, and Y.R. Wang. An RNN-based prosodic information synthesizer for Mandarintext-to-speech. IEEE Transactions on Speech and Audio Processing, 6(3):226–239, 1998.
[8] J. Tao, Y. Kang, and A. Li. Prosody conversion from neutral speech to emotional speech. IEEE Transactions on Audio, Speech and Language Processing, 14(4):1145–1154, 2006.
[9] M. Isogai and H. Mizuno. A New F0 Contour Control Method Based on Vector Representation of F0 Contour. In Sixth European Conference on Speech Communication and Technology. ISCA, 1999.
[10] D.T. Chappell and J.H.L. Hansen. Speaker-specific pitch contour modeling and modification. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 1998, volume 2, pages 885–888, 1998.
[11] Z. Inanoglu. Transforming pitch in a voice conversion framework. Master’s thesis, St. Edmund’s College, University of Cambridge, 2003.
[12] T. Ceyssens, W. Verhelst, and P. Wambacq. On The Construction Of A Pitch Conversion System. In Proceedings of European Signal Processing Conference, volume I, pages 423–426, 2002.
[13] H. Kawahara, A. Cheveign′e, H. Banno, T. Takahashi, and T. Irino. Nearly Defect-Free F0 Trajectory Extraction for Expressive Speech Modifications Based on STRAIGHT. In Ninth European Conference on Speech Communication and Technology. ISCA, 2005.
[14] S.H. Pin, Y. Lee, Y. Chen, H. Wang, and C. Tseng. A Mandarin TTS system with an integrated prosodic model. 2004 International Symposium on Chinese Spoken Language Processing, pages 169–172, 2004.
[15] H. Tseng, P. Chang, G. Andrew, D. Jurafsky, and C. Manning. A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, pages 168–171. Jeju Island, Korea, 2005.
[16] P. C. Chang, M. Galley, and C. D. Manning. Optimizing Chinese word segmentation for machine translation performance. In Proceedings of the Third Workshop on Statistical Machine Translation, pages 224–232, Columbus, Ohio, June 2008. Association for Computational Linguistics.
[17] N. XUE, FEI XIA, F.U.D. CHIOU, and M. PALMER. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus. Natural Language Engineering, 11(02):207–238, 2005.
[18] G. Monti and M. Sandler. MONOPHONIC TRANSCRIPTION WITH AUTOCORRELATION. In Proceedings of the Workshop on Digital Audio Effects (DAFx-00), volume 12, 2000.
[19] G. S. Ying, L. H. Jamieson, and C. D. Michell. A probabilistic approach to AMDF pitch detection. Proceedings of the Fourth International Conference on
Spoken Language, 1996, 2:1201–1204, 1996.
[20] E. Moulines and F. Charpentier. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9 (5-6):453–467, 1990.
[21] H. Valbret, E. Moulines, J.P. Tubach, and T. Paris. Voice transformation using PSOLA technique. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1992, 1:145–148, 1992.
[22] W.B. Kleijn, H. Yang, and E.F. Deprettere. Waveform Interpolation Coding With Pitch-Spaced Subbands. In Fifth International Conference on Spoken Language Processing. ISCA, 1998.
[23] M. Chu, Y. Wang, and L. He. Labeling stress in continuous Mandarin speech perceptually. In Proceedings of the 15th International Congress of Phonetic Science, pages 2095–2098, 2003.
[24] J. E. Hopcroft, R. Motwani, and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation (3rd Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2006. ISBN 0321455363.
[25] 黃奕欽 and 陳嘉平. 前後文無關文法於語音合成語料庫之應用. In Proceedings of the 25th Workshop on Combinatorial Mathematics and Computation Theory, pages 455–459, 2008.
[26] G.J.L.S.G. Chen and T. Wu. High quality and low complexity pitch modification of acousticsignals. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 1995, volume 5, pages 2987–2990, 1995.
[27] S. Lemmetty. Review of Speech Synthesis Technology. Master’s thesis, Helsinki University of Technology, 1999.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top