論文名稱(外文):Normalization and Prediction of Syllable Initial and Final Durations for speech Synthesis
指導教授(外文):Hung-Yan Gu
口試委員(外文):Hsin-Min WangMing-Shing YuKuo-Liang ChungHung-Yan Gu
外文關鍵詞:Speech synthesisSuration predictionNormalization
Weka 軟體去建造分類迴歸樹,用以預測欲合成文句之聲、韻母時長,
去對聲、韻母時長作正規化,再使用Weka 軟體去建造聲、韻母時長
MOS 評分的聽測實驗,由合成語音自然度比較的平均評分可發現,我
前人方法所合成出的音檔更貼近於人類的說話方式;在自然度MOS 評
分方面,我們的合成音檔的平均評分都可達3.5 分以上,最好的一個
音檔則已超過4 分,這表示大部分受測者都肯定我們的合成音檔,已
In this thesis, normalization methods for syllable
initial and final durations are studied. Also, a feature set
is designed for Weka to construct classification and regression
trees (CART) to predict the syllable initial and final
durations of a text sentence to be synthesized. We hope to
combine the two studies (duration normalization and duration
prediction in terms of CART),to increase the naturalness level
of the synthesized speech especially in the arrangement of
initial an final durations. In the training stage, the original
durations of syllable initial and final are obtained by reading
the corresponding label file of a training sentence. Then, the
method, two level standard deviation matching, proposed here
is used to normalize the durations of syllable initials and
finals. Next, the software, Weka, is used to construct two CART
trees for the durations of syllable initials and finals
respectively. In the synthesis stage, we develop program
modules to predict the duration of a syllable initial or final
according to the two CART constructed by Weka. Then these
program modules are integrated to the speech synthesis system
developed by predecessor researchers. Hence, the system can
synthesize speech signals according to the duration
normalization and prediction methods studied in this thesis.
By using the synthesized speechs, we conduct two types of
listening tests including naturalness level comparison and
naturalness level MOS evaluation. According to the average
scores obtained from the listening tests, naturalness level
comparison, the duration prediction method studied here is
indeed better than the method provided by predecessor
researchers. This is because the arrangement of syllable
initial and final durations by our method is more natural. In
addition, according to the average scores obtained from the
listening tests, naturalness level MOS evaluation, most
participants agree that the synthetic speechs by using our
duration prediction method are very close to the corresponding speechs uttered by a real speaker. In details, the average
scores of our synthetic speechs are all greater than 3.5 points,
and one of them is greater than 4 points. Therefore, the
naturalness level of the synthetic speechs by using our
duration normalization and prediction methods is very close to
the speechs uttered by a real person.
摘要 ............................................................................................................................................ I
目錄 ........................................................................................................................................... V
圖表索引 ................................................................................................................................ VIII
第1 章 緒論 ............................................................................................................................. 1
1.1 研究動機 ....................................................................................................................... 1
1.2 文獻探討 ....................................................................................................................... 1
1.2.1 語音合成方法回顧 ............................................................................................... 2
1.2.2 韻律參數產生 ....................................................................................................... 5
1.2.3 時長正規化方法 ................................................................................................... 6
1.2.4 時長預測方法 ....................................................................................................... 7
1.3 研究方法 ....................................................................................................................... 8
1.4 論文章節簡述 ............................................................................................................. 11
第2 章 訓練語料與特徵集之製作 ....................................................................................... 12
2.1 語料準備 ..................................................................................................................... 12
2.2 特徵集之各項屬性 ..................................................................................................... 15
第3 章 音節時長與韻母時長之正規化 ............................................................................... 21
3.1 前人之時長正規化方法 ............................................................................................. 21
3.2 迴歸係數估計 ............................................................................................................. 22
3.3 音節時長正規化 ......................................................................................................... 24
3.5 韻母時長正規化—韻母標準差匹配法 ..................................................................... 25
3.6 韻母時長正規化—雙層標準差匹配法 ...................................................................... 26
3.7 韻母時長正規化—串接式正規化法 .......................................................................... 28
3.8 正規化方法之實驗 ...................................................................................................... 28
第4 章 韻母時長之預測 ....................................................................................................... 35
4.1 WEKA 軟體簡介 ............................................................................................................ 35
4.2 CART 演算法 ................................................................................................................ 36
4.3 Weka 作分類迴歸分析之步驟 .................................................................................... 38
4.4 Weka 之時長預測誤差量測結果 ................................................................................ 42
4.4.1 Weka 演算法之選擇 ............................................................................................ 42
4.4.2 音節時長正規化方法之Weka 時長預測實驗 ................................................... 43
4.4.3 韻母時長正規化方法之Weka 時長預測實驗 ................................................... 43
4.4.4 聲母時長之Weka 時長預測實驗 ....................................................................... 44
4.4.5 比較TLSDM+Weka(M5P)法與他人之時長預測法 ............................................... 45
4.5 預測聲、韻母時長之程式模組製作 .......................................................................... 46
第5 章 語音合成系統整合 ................................................................................................... 49
5.1 原系統之功能 ............................................................................................................. 49
5.2 加入時長預測模組 ..................................................................................................... 50
5.3 系統介面 ..................................................................................................................... 53
5.4 時長預測之測試 ......................................................................................................... 55
5.5 聽測實驗 ..................................................................................................................... 59
5.5.1 合成語音自然度比較 ......................................................................................... 59
5.5.2 合成語音自然度MOS 評分 ................................................................................. 62
第6 章 結論 ........................................................................................................................... 64
參考文獻 ................................................................................................................................. 67
