研究生(外文):Hong-Yan Lee
論文名稱(外文):Pitch Detection with Tone Model and Tone Recognition in Mandarin Speech
指導教授(外文):Sin-Horng Chen
外文關鍵詞:pitchtone recognition
In this thesis, a new model-based pitch tracking scheme is proposed. It tracks pitch and recognizes tones simultaneously using a statistical prosody model of Mandarin speech. With the guide of the prosody model, the pitch tracking can be more reliable so as to reduce both half pitch error and double pitch error. Experimental results showed that the gross pitch error (GPE) of pitch detection was reduced from 1.121% to 0.918% by using the proposed pitch estimation scheme. Both half and double pitch error rates were also reduced. Meanwhile, a tone recognition rate of 70% was achieved.
中文摘要 I
英文摘要 II
誌謝 III
目錄 IV
圖目錄 VI
表目錄 VIII
第一章 緒論 1
1.1 研究動機 1
1.2 研究方向 2
1.3 章節概要 2
第二章 以抽取瞬時頻率方式求取基頻 3
2.1瞬時頻率 3
2.2瞬時頻譜 7
2.3利用瞬時頻譜產生基頻值候選者 8
2.3.1轉換函式 9
2.3.2基頻判斷曲線 11
2.3.3從基頻判斷曲線中產生基頻值候選者 19
第三章 中文聲調辨認與基頻軌跡建立 22
3.1 國語聲調的特性 22
3.2聲調模型與韻律模型建立 25
3.3聲調模型與韻律模型之訓練 27
2.3.1初始化模型 29
2.3.2模型訓練流程 30
3.4整合聲調辨認與基頻軌跡建立 38
3.4.1以音節為單位建立音節基頻軌跡候選者 38
3.4.2聲調辨認與基頻軌跡搜尋 41
第四章 實驗結果與分析 46
4.1 使用語料 46
4.2參數微調 47
4.3音節間及音框間之基頻值平均值的比較 52
4.3.1音節間基頻平均值的比較( ) 52
4.3.2音框間基頻值的比較( ) 53
4.4 連續語音聲調辨認 55
第五章 結論與展望 61
5.1 結論 61
5.2 未來之展望 61
參考文獻 62
[1] F.J Charpentier , “Pitch detection using the short-time phase spectrum,” ICASSP’86,TOKYO
[2] T. Abe, T. Kobayashi, and S. Imai, “Robust pitch estimation with harmonic enhancement in noisy environment based on instantaneous frequency,” Proc. 4th ICSLP, pp.1277-1280, Philadelphia, Oct. 1996.
[3] T. Tanaka, T. Kobayashi, D. Arifianto, T. Masuko, “Fundamental frequency estimation based on instantaneous frequency amplitude spectrum,” Proc. ICASSP, vol-I, pp.329-332, Orlando, Fl., May 2002.
[4] D. Arifianto and T. Kobayashi, “IFAS-based voiced / unvoiced classification of speech signal,” Proc. ICASSP, vol.I, pp.812-815, Hong Kong, April 2003.
[5] D. Arifianto and T. Kobayashi,“Voiced/Unvoiced Determination of Speech Signal in Noisy Environment using Harmonicity Measure Based on Instantaneous Frequency,” Volume 1, March 18-23, 2005 Page(s):877 - 880
Digital Object Identifier 10.1109/ICASSP.2005.1415254
[6] D.J. Liu and C.T. Lin, “Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure,” IEEE Trans., Speech and Audio Proc., vol. 9, no. 6, pp. 609-621, Sept. 2002.
[7] W.-y Lin and L.-s Lee, “Improve Tone Recognition for Fluent Mandarin speech Based On New Inter-Syllabic Features and Robust Pitch Extraction,”ASRU 2003
[8] Chen-Yu Chiang, Yih-Ru Wang and Sin-Horng Chen, “On the Inter-syllable Coarticulation Effect of Pitch Modeling for Mandarin Speech,” In INTERSPEECH-2005, 3269-3272.
[9] D. Talkin, "A robust algorithm for pitch tracking (RAPT),"in Speech coding and synthesis, W. B. Kleijn and K. K.Paliwal, Eds.: Elsevier Science, 1995, pp. 495 –518.
[10] 陳鳳儀,蔡碧芳,陳克健,黃居仁,“中文句結構樹資料庫(Sinica Treebank)的構建”,中央研究院資訊所、中央研究院研究所。
[11] 曹登鈞,“利用統計方法之基週期偵測器與國語連續語音聲調辨認”,國立交通大學碩士論文,民國九十一年六月。
[12] L.S. Lee, C.Y. Tseng and M. Ouh-Young, “The Synthesis Rules in a Chinese Text-to-Speech System,” IEEE Trans. Acoust., Speech, Signal Processing, Vol.37, No.9, pp.1309-1320, Sep. 1998.
