(3.238.7.202) 您好!臺灣時間:2021/03/02 01:04
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:蔡承燁
研究生(外文):Tsai, Cheng-Yeh
論文名稱:中英夾雜語音之階層式韻律架構建立與語音合成之應用
論文名稱(外文):Prosody Hierarchy Construction for Mixed Chinese-English Spelling Speech and its Application to TTS
指導教授:陳信宏陳信宏引用關係
指導教授(外文):Chen, Sin-Horng
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電信工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2010
畢業學年度:99
語文別:中文
論文頁數:91
中文關鍵詞:中英夾雜韻律模型語音合成
外文關鍵詞:speech synthesisprosody labeling
相關次數:
  • 被引用被引用:1
  • 點閱點閱:274
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:61
  • 收藏至我的研究室書目清單書目收藏:1
本論文針對以中文文句為主體但內含英文字母之中英夾雜文句,透過語言參數和聲學參數間的關係,建立一個中英夾雜的韻律模型,並完成自動化的韻律標記。本研究所標記的韻律標記為停頓標記及韻律狀態,其中停頓標記表示韻律單元的邊界,而韻律狀態的序列表示上層韻律單元的變化。透過分析訓練出的模型參數,探討停頓標記、聲學參數、語言參數和上層韻律狀態的關係。由實驗結果顯示英文字母之上層韻律狀態是隨著整體中文語句的韻律變化而起伏,而停頓標記則是在code-switch處會有較強的韻律斷點。此外也發現到名詞片語的韻律層次結構和其語法結構有很高關聯性。
最後利用此模型提出兩種韻律產生方法,第一種為藉由停頓標記的預估,產生韻律層次的文脈相關資訊,透過HTS產生韻律參數,第二種則是應用前述的韻律模型直接預估韻律參數。由客觀評估的實驗結果顯示,第一種方法的確能改善傳統HTS所產生之韻律參數,第二種方法則是在音節長度預測有顯著的效果。而主觀評估的結果也顯示第一種方法在聽覺上有最佳的自然度表現,代表透過本研究所預估的停頓標記能抓到更自然的韻律節奏變化。

In this thesis, an unsupervised joint prosody labeling and modeling (PLM) method for mixed Chinese-English word spelling speech is proposed. It labels an unlabeled corpus with two types of prosodic tags (i.e., break type of inter-syllable juncture and prosodic state of syllable) and builds four prosodic models simultaneously. The break tags can be used to delimit prosodic constituents of a hierarchical prosody structure, and the prosodic state can be used to construct the prosodic feature patterns of prosodic constituents. The four prosodic models describe the relationships of acoustic prosodic features, prosodic tags of utterances, and the linguistic features of the associated texts. The experimental results showed that prosodic variation in English word spelling was influenced by both the prosodic state that describes underlying intonation and Chinese tone borrowing effect. Besides, the relationship between hierarchical noun phrase structure and corresponding break type was also analyzed. The analysis suggested that magnitude of the break type was highly correlated with syntactic hierarchy in a noun phrase.
Lastly, we propose two prosody generation methods for mixed Chinese-English word spelling Text-to-Speech system (TTS) based on PLM. In the first method, a break predictor is constrcted by CART method. Then, the related linguistic features and the predicted break tags are used for HMM-based Text-to-Speech system (HTS) training. In the second method, PLM is directly used as a prosody generator. Experimental results confirmed that the proposed method one was superior to the conventional HTS that only use linguistic features both in objective and subjective tests. Besides, the proposed method two was significantly better than the conventional HTS method at syllable duration prediction. Therefore, we conclude that the proposed PLM method was successful in prosody labeling and modeling for constructing a mixed Chinese-English word spelling TTS.

第一章 緒論 1
1.1研究動機 1
1.2文獻回顧 1
1.3研究方向 2
1.4中英文夾雜TTS系統架構簡介 3
1.5語料庫簡介 5
1.6章節概要說明 6
第二章 HMM-based中英夾雜語音合成器 7
2.1 HMM-based語音合成系統 7
2.2 HMM-based中英文夾雜語音合成基礎系統的建立 8
2.2.1中文與英文字母音素模型 9
2.2.2文本標示資訊與問題集設計 9
第三章 中英文夾雜韻律模型 13
3.1中英夾雜語音韻律之特性 13
3.2階層式韻律架構 17
3.3中文PLM演算法 20
3.4中英夾雜 PLM演算法 26
3.5中英文夾雜韻律模型之訓練 29
3.5.1 初始化(Initialization) 29
3.5.2 重覆疊代(Iteration) 35
第四章 韻律模型訓練結果與分析 37
4.1音節韻律模型 37
4.1.1.1音節層次中基頻之影響型態 39
4.1.1.2音節層次中音節長度之影響型態 45
4.1.1.3音節層次中音節能量之影響型態 46
4.1.2 上層韻律狀態之影響型態 46
4.2停頓標記聲學模型 48
4.3韻律狀態轉移模型 50
4.4停頓標記語言模型 52
4.5韻律標記結果之分析 54
第五章 基於PLM演算法之韻律產生器 65
5.1停頓標記預估 65
5.1.1 All-in-one CART-based 65
5.1.2 Two-stage CART-based 68
5.2 PLM之韻律參數預估 70
5.3語音合成實驗結果與分析 71
第六章 結論與未來展望 78
6.1結論 78
6.2未來展望 79
參考文獻 80
附錄一 83
附錄二 85
附錄三 90

【1】 F. Deprez, J. Odijk, and J. D. Moortel, “Introduction to Multilingual Corpus-based Concatenative Speech Synthesis,” Proc. of Interspeech, pp.2129-2132, August 2007.
【2】 M. Chu, H. Peng, Y. Zhao, Z. Y. Niu, and E. Chang, “Microsoft Mulan - A Bilingual TTS System, ” Proc. of ICASSP, vol.1, pp.264-267, 2003.
【3】 A. W. Black, and K. A. Lenzo, “Multilingual Text-to-Speech Synthesis,” Proc. of ICASSP, vol.3, pp.761-764, 2004.
【4】 Wei-Chih Kuo, Yih-Ru Wang, Hung-Mao Lu, and Sin-Horng Chen, “An NN-based Approach to Prosody Generation for English Word Spelling in English-Chinese Bilingual TTS, ” in Eurospeech-2003, 3109-3112
【5】 Sin-Horng Chen, Shaw-Hwa Hwang, and Yih-Ru Wang, “An RNN-Based Prosodic Information Synthesizer for Mandarin Text-to-Speech,” IEEE Trans. Speech Audio Processing, vol.6, no.3, pp.226-239,1998.
【6】 Yi Zhang, Jianhua Tao, “Prosody Modification on Mixed-Language Speech Synthesis, ” Chinese Spoken Language Processing, 2008 ISCSLP
【7】 Hui Liang, Yao Qian, Frank K. Soong, Gongshen Liu, “A Cross-Language State Mapping Approch to Bilingual (Mandarin-English) TTS,” ICASSP 2008
【8】 T. A. Myrvoll, and F. K. Soong, “Optimal Clustering of Multivariate Normal Distributions Using Divergence and Its Application to HMM Adaptation,” Proc. of ICASSP, vol.1,pp.552-555, April 2003.
【9】 Y. Zhao, C. Zhang, F. K. Soong, M. Chu, and X. Xiao, “Measuring Attribute Dissimilarity with HMM KL-Divergence for Speech Synthesis, ” Proc. of the 6th ISCA Speech Synthesis Workshop, pp.206-210, August 2007
【10】 江振宇,“非監督式中文語音韻律標記及韻律模式”,國立交通大學博士論文,民國九十八年三月
【11】 S. Imai, “Cepstral analysis synthesis on the mel frequency scale,” Proc. of ICASSP, pp.93–96, Feb. 1983.
【12】 Young, S. J., Evermann, G., Gales, M. J. F., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P. C., The HTK Book, version 3.4. Cambridge University Engineering Department, Cambridge, UK. 2006.
【13】 K. Sjlander and J. Beskow, “Wavesurfer - an open source speech tool,” in Proceeding of the ICSLP 2000, Vol. 4, pp. 464-467.
【14】 S.H. Chen and Y.R. Wang, “Vector Quantization of Pitch Information in Mandarin Speech”, IEEE Transactions on Communications, Vol. 38, No. 9, pp. 1317-1320, 1990.
【15】 T. Yoshimura, T. Masuko, K. Tokuda, T. Kobayashi, T. Kitamura, “Speaker interpolation for HMM-based speech synthesis system,” J. Acoust. Soc. Jpn. (E), vol.21, no.4, pp.199-206, 2000
【16】 M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi, “Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR,” Proc of ICASSP, pp.805-808, May 2001
【17】 Zen, H., Nose, T., Yamagishi, J., Sako, S. and Tokuda, K., The HMM-based Speech System(HTS) Version 2.1,2007,http://hts.sp.nitech.ac.jp/
【18】 T. Yoshimura, “Simulations Modeling of Phonetic and Prosodic Parameters, and Characteristic Conversion for HMM-based Text-to-Speech Systems,” Department of Electrical and Computer Engineering Nagoya Institute of Technology, 2002
【19】 Z. Sheng, J.-H. Tao, and D.-L. Jiang, “Chinese prosodic phrasing with extended features,” Proceedings of the IEEE ICASSP ,Vol. 1, pp. 492–495. 2003
【20】 C.-Y. Tseng, S.-H. Pin, Y.-L. Lee, H.-M. Wang, and Y.-C. Chen, “Fluent speech prosody: Framework and modeling,”Speech Commun. special issue on quantitative prosody modeling for natural speech description and generation, 46, 284–309 (2005).
【21】 Keiichi Tokuda, Takayoshi Yoshimura, Takashi Masuko, Takao Kobayashi ,and Tadashi Kitamura , “Speech parameter generation algorithms for HMM-Based speech synthesis” Proc. of ICASSP, pp.1315-1318, June 2000
【22】 吳仲耘,“應用韻律階層及動態參數之音高預測在基於HMM之中文語音合成器”,國立成功大學碩士論文,民國九十七年七月。

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔