臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.81) 您好！臺灣時間：2025/10/05 15:09

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

陳彥樺

研究生(外文):

Yen-Hua Chen

論文名稱:

以聲學語言模型、全域變異數匹配及目標音框挑選作強化之語音轉換系統

論文名稱(外文):

A Voice Conversion System Enhanced with Acoustic Language-model, Global Variance Matching, and Target Frame Selection

指導教授:

古鴻炎

指導教授(外文):

Hung-Yan Gu

口試委員:

古鴻炎

口試委員(外文):

Hung-Yan Gu

口試日期:

2016-01-25

學位類別:

碩士

校院名稱:

國立臺灣科技大學

系所名稱:

資訊工程系

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2016

畢業學年度:

104

語文別:

中文

論文頁數:

中文關鍵詞:

語音轉換、聲學語言模型、目標音框挑選、全域變異數、離散倒頻譜係數、高斯混合模型、諧波加雜音模型

外文關鍵詞:

voice conversion、acoustic language-model、target frame selection、global variance、discrete cepstral coefficient、Gaussian mixture model、harmonic-plus-noise model

相關次數:

被引用:1
點閱:218
評分:
下載:13
書目收藏:0

本論文研究了組合式之語音轉換方法來強化以GMM為基礎之語音轉換功能，這種組合式方法包含了PPM聲學語言模型(ALM)、目標音框挑選(TFS)與全域變異數(GV)匹配等處理步驟，我們實作了兩個組合式語音轉換方法，分別是ALM+TFS+GV法與ALM+GV+TFS法。在訓練階段，我們使用訓練出的GMM之128個高斯混合的平均向量來建立近似音素之二元分類樹，再用此分類樹於訓練PPM聲學語言模型。在轉換階段，我們依據ALM估計的機率去對輸入音框作近似音素的分段，然後各音框依其對應的近似音素去作單一高斯混合之頻譜對映，接著再作TFS與GV匹配等處理，以便改善頻譜包絡過度平滑的問題。TFS依轉換後音框的DCC係數，到目標語者訓練語料中挑選出距離最接近的音框DCC來做取代；GV匹配則是把一序列音框的DCC係數之變異數特性匹配到目標語者的變異數特性。由客觀量測實驗的結果發現，我們轉換方法的平均DCC誤差距離會比基本轉換方法的大，但變異數比值(VR)則會變高變好。此外，從主觀聽測實驗的結果可知，本論文所提出的語音轉換方法能夠提升轉換後語音的信號品質，並且轉換出語音的音色也相當接近目標語者的。

In this thesis, a combination method for voice conversion is proposed to enhance the performance of GMM based voice conversion systems. The combination method includes the processing modules, PPM acoustic language-model (ALM), target frame selection (TFS), and global variance (GV) matching. Actually, we implement the two voice conversion methods: ALM+TFS+GV and ALM+GV+TFS. In training stage, we use the 128 mean vectors of Gaussian mixtures from a trained GMM to establish a quasi-phonetic symbol binary classification tree. Then, the tree is used to train ALM. In conversion stage, input voice frames are segmented according to the probabilities estimated by ALM. Next, each voice frame’s spectrum is mapped with a single Gaussian mixture that corresponds to this frame. Afterward, the two modules, TFS and GV, are executed in order to reduce the problem of over-smoothed spectral envelope. In TFS, a converted DCC (discrete cepstral coefficient) vector for an input frame is used to search the nearest frame from the target-speaker training frames, and the found DCC is taken to replace the converted. GV matching can adjust the DCC’s variance of a sequence of converted DCC to match the variance of the target-speaker’s training DCC vectors.
According to the results of objective tests, the average DCC error of our method is larger than the baseline method. However, the signal-quality index, variance ratio (VR), indicates our method is better. In addition, according to the results of perception tests, the converted speech by our method can obtain higher signal quality and higher timbre similarity than the baseline method.

摘要 I
ABSTRCT II
誌謝 III
目錄 IV
圖表索引 VI
第1章緒論 1
1.1 研究動機 1
1.2 文獻回顧 1
1.3 研究方法 5
1.3.1 語音轉換系統之訓練流程 7
1.3.2 語音轉換系統之轉換流程 9
1.4 論文架構 11
第2章語料準備與頻譜特徵參數 12
2.1 語料錄音 12
2.2 標音、切音 12
2.3 離散倒頻譜係數估計 14
2.4 DTW音框匹配 16
第3章聲學語言模型之訓練與測試 18
3.1 建立近似音素之分類樹 18
3.2 向量量化編碼－近似音素符號 21
3.3 訓練PPM聲學語言模型 22
3.4 基於聲學語言模型之近似音素挑選 25
3.4.1 近似音素符號挑選 26
3.4.2 以動態規劃尋找最佳之近似音素序列 27
3.5 聲學語言模型之測試 29
3.5.1 PPM聲學語言模型之perplexity評估 29
3.5.2 近似音素挑選之正確率 31
第4章頻譜係數對映 36
4.1 高斯混合模型（GMM） 36
4.2 單一高斯混合挑選與對映 39
4.2.1 單一高斯混合挑選 40
4.2.2 單一高斯混合對映 42
第5章頻譜過度平滑之改進方法 44
5.1 目標音框挑選 45
5.2 全域變異數調整法 48
第6章語音轉換實驗 51
6.1 語者配對 51
6.2 平均DCC誤差及變異數比值量測 53
6.3 實驗一：全域變異數權重值之比較 54
6.4 實驗二：不同轉換方法之比較 57
第7章系統製作與聽測實驗 60
7.1 音高轉換 60
7.2 HNM合成 61
7.3 聽測實驗一：本論文方法之比較 63
7.4 聽測實驗二：與前人方法之比較 66
第8章結論 69
參考文獻 73

[1]蔡松?，GMM為基礎之語音轉換法的改進，國立台灣科技大學資訊工程所碩士論文，2009。
[2]王讚緯，使用值方圖等化及目標音框挑選之語音轉換系統，國立台灣科技大學資訊工程所碩士論文，2014。
[3]張家維，使用主成分向量投影及最小均方對映之語音轉換方法，國立台灣科技大學資訊工程所碩士論文，2012。
[4]H.Valbret, E. Moulines, and J.P. Tubach, “Voice transformation using PSOLA technique,” in 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-92, vol. 1, San Francisco, CA, USA, 23-26 Mar. 1992, pp. 145-148.
[5]D. Erro, E. Navas, and I. Hernaez, “Parametric voice conversion based on bilinear frequency warping plus amplitude scaling,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 3, pp.556-566, 2013.
[6]X. Tian, Z. Wu, S.W. Lee, and E.S. Chng, “Correlation-based frequency warping for voice conversion,” In 2014 9th International Symposium on Chinese Spoken Language Processing, ISCSLP, Singapore, 12-14 Sept. 2014, pp. 211-215, IEEE.
[7]M. Narendranath, H.A. Murthy, S. Rajendran, and B. Yegnanarayana, “Transformation of formants for voice conversion using artificial neural networks,” Speech Communication, vol. 16, no. 2, pp. 207-216, 1995.
[8]F.L. Xie, Y. Qian, Y. Fan, F. K. Soong, and H. Li, “Sequence error(SE) minimization training of neural network for voice conversion,” Interspeech, pp. 2283-2287, 2014.
[9]Y. Stylianou and O. Cappe and E. Moulines, “Continuous probabilistic transform for voice conversion,” IEEE Transaction on Speech and Audio Processing, Vol. 6, No. 2, pp. 131-142, 1998.
[10]T. Toda, A. W. Black, and K. Tokuda, “Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,” IEEE Transactions on Speech and Audio Processing, Vol. 15, No. 8, pp. 2222-2235, 2007.
[11]T. Dutoit, A. Holzapfel, M. Jottrand, A. Moinet, J. Perez, and Y. Stylianou, “Toward a voice conversion system based on frame selection,” in 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, vol. 4, Honolulu, HI, 15-20 Apr. 2007, pp. 513-516.
[12]H.Y. Gu and S.F. Tsai, “A voice conversion method combining segmental GMM mapping with target frame selection”, Journal of Information Science and Engineering, Vol. 31, No. 2, pp. 609-626, 2015.
[13]蔡仲明，基於GMM及PPM模型的國、閩南、客語之語言辨識，國立台灣科技大學資訊工程所碩士論文，2007。
[14]S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK book(for HTK version 3.2.1), Cambridge University Engineering Department, 2002.
[15]K. Sjolander and J. Beskow, Centre of Speech Technolodge at KTH, Available: http:// www.speech.kth.se/wavesurfer/.
[16]吳昌益，使用頻譜演進模型之國語語音合成研究，國立台灣科技大學資訊工程所碩士論文，2007。
[17]K. Sayood, Data Compression, 2nd ed., San Francisco, CA: Morgan Kaufmann Publishers, 2000.
[18]W.J. Teahan, “Probability estimation for PPM,” in New Zealand Computer Science Research Student Conference, NZCSRSC'95, Apr. 1995.
[19]L. Rabiner and B.H. Juang, Fundamental of speech recognition, NJ, USA: Prentice Hall, 1993.
[20]T. Caliński and J. Harabasz, “A dendrite method for cluster analysis,” Communication in Statistics, Vol. 3, No. 1, pp. 1-27, 1974.
[21]R.A. Redner and H.F. Walker, “Mixture densities, maximum likelihood and the EM algorithm,” SIAM Review, Vol. 26, No. 2, pp. 195-239, 1984.
[22]A. Kain, High resolution voice transformation, PhD dissertation, Oregon Health & Science University, 2001.
[23]T. Toda, A.W. Black, and K. Tokuda, “Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,” IEEE Transactions on Audio, Speech, Language, Processing, vol. 15, no. 8, pp. 2222-2235, 2007.
[24]洪尉翔，使用MGE訓練之HMM模型及全域變異數匹配之合成語音信號音質改進方法，國立台灣科技大學資訊工程所碩士論文，2015。
[25]E. Godoy, O. Rosec and T. Chonavel, “Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, pp. 1313-1323, 2012.
[26]Y. Stylinaou, Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification, PhD Thesis, Ecole National Superieure des Telecommunications, 1996.

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	使用頻譜演進模型之國語語音合成研究
2.	使用直方圖等化及目標音框挑選之語音轉換系統
3.	使用MGE訓練之HMM模型及全域變異數匹配之合成語音信號音質改進方法
4.	使用主成分向量投影及最小均方對映之語音轉換方法
5.	基於GMM及PPM模型的國、閩南、客語之語言辨識
6.	基於HMM模型之歌聲合成與音色轉換
7.	結合情緒合成之語音轉換系統

無相關期刊

1.	使用半音節單元挑選及HNM信號模型之國語歌聲合成
2.	網格系統應用於漢字字體設計之創作研究
3.	含防挫屈裝置H型鋼梁之反復載重試驗
4.	以人臉地標點和深度轉移為基礎之跨視角人臉辨識
5.	運用多層影像深度資訊於改進視訊編碼效能
6.	JPEG2000系統之位元平面編碼分析與加速演算法設計
7.	以權重式元件面積配置為基礎來優化並實現關鍵比率之類比積體電路
8.	自然光照明系統之自由曲面結構光準直器設計
9.	圖模型分解運用於活動辨識與推論
10.	低成本可攜式多視點多媒體系統之研究
11.	基於單一四分樹之最近k點空間關鍵字查詢演算法
12.	建築信息模型於大學設施能源管理之應用
13.	應用雙色棋盤校正板於影像式3D樓梯重建
14.	餐飲業經營策略之商業模式及服務創新的個案研究
15.	應用直覺行為之設計創作

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室