跳到主要內容

臺灣博碩士論文加值系統

(2600:1f28:365:80b0:b669:e553:ec7:b9d5) 您好!臺灣時間:2024/12/03 07:45
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳冠廷
研究生(外文):Kuan-Ting Chen
論文名稱:基於混合式方法的華語語料庫之自動切音研究
論文名稱(外文):A Hybrid Approach to Automatic Speech Segmentation for Mandarin Speech Corpora
指導教授:張智星張智星引用關係
指導教授(外文):Jyh-Shing Roger Jang
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學門:電算機學門
學類:系統設計學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:英文
論文頁數:47
中文關鍵詞:automatic segmentationphonetic labelingHMM-based recognizersequential forward selectionk-nearest neighbor ruleleave-one-out
相關次數:
  • 被引用被引用:0
  • 點閱點閱:279
  • 評分評分:
  • 下載下載:32
  • 收藏至我的研究室書目清單書目收藏:1
精確的標音對於以大語料為基礎的語音合成系統(corpus-based TTS)相當重要,然而以維特比(Viterbi)進行強制對位(forced alignment)的自動切音結果並不夠精確,加上適合某種語言的自動標音方式並不完全可以套用在另一種語言,因此,我們針對華語語料提供一種新的分界點微調(boundary refinement )方式。
本論文所使用的方法,針對華語的語音特性分成四大類,接著針對不同的分界點組合,我們利用圖形識別(pattern recognition)的方式選擇合適的聲學特徵,各自進行分界點微調,其中連音類(“periodic voiced + periodic voiced”)的分界點微調結果並不理想,關於此類我們採用以共振峰(formant)為基礎的新特徵進行特別處理。
為了驗證我們所提出方法的可行性,我們比較前人以CART(Classification and Regression Tree)為基礎的分界點微調方式,並提供許多實驗數據比較,根據實驗結果,我們所使用的分界點微調方式能夠得到相當高的切音準確率。
Precise phone/syllable boundary labeling of the utterances in a speech corpus plays an important role in constructing a corpus-based TTS (text-to-speech) system. However, automatic labeling based on Viterbi forced alignment does not always produce satisfactory results. Moreover, a suitable labeling method for one language does not necessarily produce desirable results for another language. Hence in this thesis, we propose a new procedure for refining the boundaries of utterances in a Mandarin speech corpus. This procedure employs different sets of acoustic features for four different phonetic categories. In addition, a new scheme is proposed to deal with the “periodic voiced + periodic voiced” case, which produced most of the segmentation errors in our experiment. Several experiments were conducted to demonstrate the feasibility of the proposed approach.
1 INTRODUCTION 1
1.1 MOTIVATION 1
1.2 RELATED WORK 1
1.3 SUMMARY OF THE THESIS 3
1.4 ORGANIZATION OF THE THESIS 5
2 HMM BASED RECOGNIZER 6
2.1 SPEECH CORPUS INTRODUCTION 6
2.2 FROM ORTHOGRAPHIC TRANSCRIPTION TO PHONETIC TRANSCRIPTION 8
2.3 TRAINING DIFFERENT ACOUSTIC MODELS OF HMM-BASED RECOGNIZERS 10
3 DESIGN OF THE REFINEMENT PROCEDURE 12
3.1 FOUR PHONETIC CATEGORIES 12
3.2 FEATURE DEFINITION 14
3.2.1 Bisector Frequency 15
3.2.2 Burst Degree 16
3.3 FEATURE SELECTION BASED ON PHONETIC CATEGORIES 18
3.3.1 Defining candidate boundaries for training data 20
3.3.2 Feature definition 21
3.3.3 Feature selection by SFS, KNNR and LOO 21
3.3.4 Classification rates for phonetic category transitions 22
3.4 FURTHER IMPROVEMENT FOR “PERIODIC VOICED + PERIODIC VOICED” CASES 25
3.4.1 “Divide and conquer” method 25
3.4.2 A heuristic approach for group W 25
4 EXPERIMENT RESULTS AND ERROR ANALYSIS 30
4.1 THE PERFORMANCE OF DIFFERENT ACOUSTIC MODES FOR LABELING THE TRAIN-455 CORPUS 30
4.2 A COMPARISON OF THE SEGMENTATION ACCURACY BETWEEN FORCED ALIGNMENT AND OUR REFINEMENT PROCEDURE 31
4.3 A COMPARISON WITH CART-BASED REFINEMENT PROCEDURE AND OUR REFINEMENT PROCEDURE 36
4.3.1 CART-Based Refinement Procedure 36
4.3.2 The Comparison between our refinement method and CART-based method 38
4.4 RESULTS AND DISCUSSIONS 39
5 CONCLUSIONS 41
A APPENDIX 42
A.1 AN OVERVIEW OF THE CART METHODOLOGY 42
A.1.1 Splitting Rules 42
A.1.2 Class assignment 43
A.1.3 Decide when to stop splitting 45
A.2 DISCUSSION 45
BIBLIOGRAPHY 46
[1] Cheng-Yuan Lin, Jyh-Shing Roger Jang, Kuan-Ting Chen, "Automatic Segmentation and Labeling for Mandarin Chinese Speech Corpus for Concatenation-based TTS", International Journal of Computational Linguistics and Chinese Language Processing, 2005.
[2] LiJuan Wang et al. “Refining Segmental Boundaries for TTS database Using Fine Contextual-Dependent Boundary Models”, ICASSP 2004.
[3] Sethy, A. Narayanan, S, “Refined Speech Segmentation for Concatenative Speech Synthesis”, ICSLP, 2002, pp. 149-152.
[4] D. Torre Toledano et al. “Trying to Mimic Human Segmentation of Speech Using HMM and Fuzzy Logic Post-correction Rules”, Proc. Third ESCA/COCOSDA Workshop on SPEECH SYNTHESIS, 1998.
[5] Jan P. H. van Santen, J., Sproat, R. “High-accuracy automatic segmentation”, Proceedings of European Conference on Speech Communication and Technology, 1990.
[6] Kris Demuynck and Tom Laureys. “A Comparison of Different Approaches to Automatic Speech Segmentation”, Proceedings of International Conference on Text, Speech and Dialogue, 2002, pp. 277--284.
[7] Richard O. Duda, Peter E. Hart, David G. Stork, “Pattern classification, 2nd edition”, New York, Wiley, 2001.
[8] Chen, K. J. and S. H. Liu, “Word identification for mandarin Chinese sentences,” Proceedings of the Fifteenth International Conference on Computational Linguistics, 1992, pp. 101-107.


[9] Yeh, C. L. and H. J. Lee, “Rule-based word identification for Mandarin Chinese sentences - A unification approach,” Computer Processing of Chinese and Oriental Languages, 1991, pp. 97-118.
[10] Sproat, R. and C. Shih, “A statistical method for finding word boundaries in Chinese text,” Computer Processing of Chinese and Oriental Languages, 1990, pp.336-351.
[11] Huang, X., A. Acero and H. W. Hon, Spoken language processing, Prentice Hall, New Jersey, 2001.
[12] Odell, J., D. Ollason, P. Woodland, S. Young and J. Jansen, The HTK Book for HTK V2.0, Cambridge University Press, Cambridge UK, 1995.
[13] http://www.linguiste.org/
[14] Lu, H.-M., “An implementation and Analysis of Mandarin Speech Synthesis Technologies,” MD thesis, National Chiao Tung University at Taiwan, 2002.
[15] Shen, J.-L., J.-W. Hung and L.-S. Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments,” Proceedings of International Conference on Spoken Language Processing, 1998.
[16] Fu-chiang Chou, Chiu-Yu Tseng and Lin-shan Lee, “A Set of Corpus-based Text-to-speech Synthesis Technologies for Mandarin Chinese”, IEEE Transactions on Speech and Audio Processing, Vol.10, No.7, 2002, pp.481-494.
[17] Whitney, A., "A direct method of nonparametric measurement selection," IEEE Transactions on Computers, 20(9), 1971, pp.1100-1103.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top