跳到主要內容

臺灣博碩士論文加值系統

(35.172.136.29) 您好!臺灣時間:2021/08/02 04:26
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:謝鴻文
研究生(外文):Hong-Wen Sie
論文名稱:幾個應用於連續語音音節切割之演算法的效能比較及系統實作
論文名稱(外文):Several Algorithms of Syllable Segmentation on Continuous Speech
指導教授:呂仁園呂仁園引用關係
指導教授(外文):Ren-Yuan Lyu
學位類別:碩士
校院名稱:長庚大學
系所名稱:電機工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:中文
論文頁數:125
中文關鍵詞:音節切割動態時間扭曲法強制對位知識特徵演算法
外文關鍵詞:Syllable Segmentation、Dynamic Time Warping、Force Alignment、Knowledge-Feature Algorithm
相關次數:
  • 被引用被引用:3
  • 點閱點閱:226
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
在本篇論文,我們提出一種新的演算法,我們簡稱為知識-特徵(Knowledge-Feature)為主的音節切割,它主要的目的是將一段連續或是自然的語音切割到音節的單位。它採用語音信號許多時域與頻域的聲學特徵值來切割區分不同的特性。我們從標準的手動切割運作流程來取得人類在切割時的知識,以提供更多資訊給我們所提出的演算法使用。我們使用實驗室的三語語音資料庫當做我們的實驗平台,其中包括台語、客語以及華語。我們採用三種不用的方法去切割,其中包括手動切割,監督式隱藏式馬克夫模型為主的切割以及新提出非監督式知識-特徵為主的切割法。從觀察邊界的插入率時,我們可以發現電腦比人們挑出更多的Silence區段,對於三位語音專家平均是5.15%、監督式隱藏式馬克夫模型切割者是23.41%,而我們所提出的非監督式知識-特徵切割者則是30.47%。另外,平均匹配差值對於非監督式知識-特徵切割者以及監督式隱藏式馬克夫模型分別是22.37 ms及22.94 ms。相較於監督式隱藏式馬克夫模型來說,這是一個不錯的結果。因為知識-特徵切割者是非監督式的,換句話說,它不必像監督式隱藏式馬克夫模型切割需要有語音相對應的音素才能準確的切割。若與音節間有靜音動態時間扭曲法以及音節間無靜音動態時間扭曲法相比,前者的插入率38.56%過高,後者則是遺失率40.04%過高,這表示動態時間扭曲法拿來做連續語音音節切割會有靜音的問題,無論假設存在或不存在的情況都會造成過高的插入率或遺失率,除此之外,整體的平均匹配差值分別是35.40 ms以及52.50 ms,也較監督式隱藏式馬克夫模型切割者與非監督式知識-特徵切割者的結果為差。所以我們結合了非監督式知識-特徵切割者與動態時間扭曲法,從實驗得知,這種組合的效能評估(G)9.14%表現是最佳的,其平均匹配差值是20.99毫秒,也較先前的監督式隱藏馬克夫切割者和非監督式知識-特徵音節切割者的數值更接近人工切割的結果。
In this thesis, a new approach, called Knowledge-Feature (KF) based syllabic segmentation, to segment a continuous or spontaneous speech into syllabic units is proposed. It adopted many discriminative acoustic features both in temporal and spectral domains of the speech signal. Human knowledge derived from a standard operating procedure (SOP) of manual segmentation is also added as much as possible to the proposed algorithm. Experiments have been done on a tri-lingual speech database of syllabic languages, including Taiwanese, Hakka, and Mandarin Chinese by three different approaches, including manual segmentation, supervised Hidden Markov Model (HMM) based segmentation, and the newly proposed KF based segmentation. We find that computers are apt to pick out more silence segments than what human beings will do by observing the boundary insertion rates to be 5.15% for human segmenters, 23.41% for HMM segmenter and 30.47% for KF segmenter. We also conclude that KF is as equally good as HMM by observing the mean matching difference for KF and HMM are 22.37 ms and 22.94 ms, respectively. This is a great encouragement because KF is unsupervised, i.e., it is not necessary to have the transcription phoneme sequence available like HMM for accurate segmentation. Compared to dynamic time warping (DTW) with silence between syllables and DTW without silence between syllables, the insertion rates of the former is 38.56%, the deletion rates of the latter is 40.04%, they are higher than the other methods. It has questions of silence to do continuous speech syllable segmentation by using DTW. No matther what silent is exist or not, it results insertion rates or deletion rates are too higher than ther other methods. Besides the results, their mean matching differences are 35.40 ms and 52.50 ms repectively which are worse than HMM and KF segmenter. So, we try to combined KF and DTW. From the experiments, we get the promised results. The G is 9.14% and the mean matching difference is 20.99 ms which are the most closet the SOP than HMM and KF segmenter.
目錄

指導教授推薦書 ………………………………………………………ii
口試委員會審定書……………………………………………………iii
授權書 …………………………………………………………………iv
誌謝………………………………………………………………………v
目錄 ……………………………………………………………………vi
圖目錄 …………………………………………………………………ix
表目錄 …………………………………………………………………xi
中文摘要………………………………………………………………xii
英文摘要………………………………………………………………xiv
第一章 緒論……………………………………………………………01
1.1 研究動機………………………………………………………01
1.2 相關研究………………………………………………………04
1.3 研究方法………………………………………………………08
1.4 問題描述………………………………………………………09
1.5 章節說明………………………………………………………10
第二章 手動切割順序與步驟…………………………………………11
2.1 語音基本概念…………………………………………………11
2.2 手動切割流程…………………………………………………12
第三章 監督式自動音節切割…………………………………………15
3.1 隱藏式馬克夫模型之強制對位………………………………15
3.1.1 語料準備部份………………………………………16
3.1.2 特徵擷取部份………………………………………16
3.1.3 隱藏式馬克夫模型…………………………………23
3.1.4 維特比演算法………………………………………25
3.2 動態時間扭曲法原理…………………………………………26
第四章 非監督式以知識特徵為主之音節切割法……………………29
4.1 語音聲學特徵介紹……………………………………………29
4.1.1 時域相關特徵值……………………………………29
4.1.2 頻域相關特徵值……………………………………32
4.2 知識特徵演算法流程…………………………………………36
第五章 系統實作………………………………………………………39
5.1 系統架構………………………………………………………39
5.2 系統介面………………………………………………………41
第六章 結論與展望……………………………………………………43
6.1 實驗結果分析…………………………………………………43
6.2 測試語料庫結果分析…………………………………………53
6.3 錯誤分析………………………………………………………57
6.4 未來展望………………………………………………………61
參考文獻 ………………………………………………………………63
附錄 福爾摩沙連續語音音節切割軟體開發套件
【1】L.R. Rabiner and B.H. Juang, “An Introduction to Hidden Markov Model,” IEEE ASSP Magazine, pp. 4-16, Jun. 1986.
【2】Chong-kai Wang, Ren-yuan Lyu, Yuang-chin Chiang, "An Automatic Singing Transcription System with Multilingual Singing Lyric Recognizer and Robust Melody Tracker", Proc. of 8th European Conference on Speech Communication and Technology (Euro Speech 2003) , Geneva, Switzerland.

【3】Abu-EI-Quran, A.R; Goubran, R.A.; Haptic, “Audio and Visual Environments and Their Applications”, 2003 Proceedings, Sept. 2003, Pages 43-47.
【4】鄭士賢, “高斯混合模型的學習與其在語者識別上的應用”, 交通大學碩士論文, 民國 90 年 6 月
【5】Lie Lu, Hao Jiang, Hong-Jiang Zhang, "A ROBUST AUDIO CLASSIFICATION AND SEGMENTATION METHOD", Proc. of 9th ACM International Conference on Multimedia, 2001, Pages 203-211.
【6】Ghaemmaghami, S. “Audio Segmentation and Classification based on a selective analysis scheme”, Multimedia Modeling Conference, 2004, Pages 42-48.
【7】Costa, C.H.L.; Valle, J.D., Jr.; Koerich, A.L.; “Automatic classification of audio data”, Man and Cybernetics, Oct. 2004, Pages 562-567
【8】Z. Liu, J. Huang, Y. Wang and T. Chen. “ Audio Feature Extraction and Analysis for Scene Classifiation“, IEEE Signal Processing Society, June, 1997, Pages 343-348.
【9】Harb, H Liming Chen; “Gender identification using a general audio classifier”, Mutimedia and Expo, ICME, July, 2003, Pages 733-6

【10】MF McKinney, Jeron Breebaart, “Features for Audio and Music Classification”, ISMIR, 2003.
【11】Meinedo, H.; Neto, J.; “Audio Segmentation, Classification and Clusering in a Broadcast News Task”, Acoustic, Speech, and Signal Processing, ICASSP, April, 2003, Pages 5-8.
【12】http://www.fon.hum.uva.nl/praat/
【13】Ren-Yuan Lyu et al. “A Unified Framework for Large Vocabulary Speech Recognition of Mutually Unintelligible Chinese Regionalects,” ICSLP,2004.
【14】Min-siong Liang et. al. “Construct a Multi-Lingual Speech Corpus in Taiwan with Extracting Phonetically Balanced Articles,” INTERSPEECH 2004 - ICSLP, Jeju island, Korea.
【15】L.R. Rabiner and B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall,1993.
【16】L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. of the IEEE, Vol. 77, No.2, pp. 257-286, Feb. 1989.
【17】Sakoe, H.; Chiba, S.; “Dynamic programming algorithm optimization for spoken word recognition”, Acoustics, Speech, and Signal Processing, Feb. 1978, Pages 43-49.
【18】P. Mermelstein, “Automatic segmentation of speech into syllabic units,” J Acoustic Soc Amer. 58 (1975), pp. 880–883.
【19】G. J. Prinsloo and M. W. Coetzer, “Automatic syllabification and phoneme class labelling with a phonologically based hidden Markov model and adaptive acoustical features,” Computer Speech & Language, Volume 4, Issue 3, July 1990, Pages 247-262
【20】Andrew Hunt, “Recurrent neural networks for syllabification,” Speech Communication , Volume 13, Issues 3-4 , December 1993, Pages 323-332
【21】V. Kamakshi Prasad , et al., “Automatic segmentation of continuous speech using minimum phase group delay functions,” Speech Communication, Volume 42, Issues 3-4, April 2004, Pages 429-446
【22】Kipp Michael, “Anvil - A Generic Annotation Tool for Multimodal Dialogue,” Proceedings of Eurospeech, Aalborg 2001
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top