跳到主要內容

臺灣博碩士論文加值系統

(34.226.244.254) 您好!臺灣時間:2021/08/01 05:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:康育楷
研究生(外文):Yu-Kai Kang
論文名稱:自發性語音辨識中音節合併現象之偵測與修正
論文名稱(外文):Detection and Correction of Syllable Contraction in Spontaneous Speech Recognition
指導教授:吳宗憲吳宗憲引用關係
指導教授(外文):Chung-Hsien Wu
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:80
中文關鍵詞:音長資訊音節合併自發性語音辨識
外文關鍵詞:spontaneous speech recognitionduration informationsyllable contraction
相關次數:
  • 被引用被引用:1
  • 點閱點閱:173
  • 評分評分:
  • 下載下載:40
  • 收藏至我的研究室書目清單書目收藏:1
近幾年來,自動語音辨識器在朗讀式語音的技術已經臻於成熟,然而應用於實際生活中口語對話中,因文字並不會被字正腔圓地唸出來,使得ASR的效能大幅降低。在造成自發性口語辨識效能降低的許多因素裡,其中因為語者說話的加速而導致音節發音長度縮短,甚至是產生音節合併現象,使得ASR的辨識結果錯誤而無法被閱讀。
本論文目的要偵測與修正音節合併的現象,提出利用重新調整音節邊界並依其出現頻率與排序來選擇適合的音節合併候選音,藉由候選音來偵測音節合併現象。經由語音辨識結果所建立的word graph,在偵測出可能之音節合併的地方加入音節合併詞彙,藉由圖型模型(graphical model)的理論下,提出使用音節合併音長模型,其中考慮音節合併造成聲學模型內音框狀態匹配長度分布,和對於音節合併可能產生變異的條件機率,重新計算各字詞的事後機率,以期修正最後的答案。
在實驗部份,論文方法使用中研院所錄製的現代漢語口語對話語料庫做為評估語料。對音節合併詞彙部分的正確率約提昇22%,論文方法同時使用多音節升學模型約可提升41%。最後的音節和詞彙辨識率分別提升約1.7%、2.3%,論文方法同時使用多音節聲學模型可以改善2.9%和3.6%,表示使用音節合併音長模型確實有助於修正音節合併導致的辨識錯誤,提升最後的辨識率。
Recently, automatic speech recognition (ASR) technology for read speech has attained a high level of maturity. However, in spontaneous conversation, the performance of ASR is degraded by certain human habits, such as a rapid speaking rate. This results in shorter syllable durations in spontaneous speech when compared with read speech, which can lead to the syllable contraction (SC) phenomenon.
The goal of this thesis is to detect and correct errors caused by SC. We propose an approach which relaxes word boundaries to obtain probable SC candidates, which are used to detect syllable contraction. After extending the word graph with SC words, we propose a graphical-model-based approach to rescore all probable paths. This approach includes an acoustic model, a language model and a syllable contraction duration model (SCDM), which includes SC duration information (SCDI) and the SC conditional probability. After rescoring, the correction is obtained by finding the best path in the word graph.
The proposed approach was evaluated on the Mandarin Conversational Dialogue Corpus (MCDC), which was collected and annotated by Sinica. The recall rate on SC word correction was improved by about 22% using the SCDM alone and by about 41% using the approach combining the SCDM with a syllable pair acoustic model (SPAM). The improvement in syllable and word recognition rates was 1.7% and 2.3%, respectively, using the SCDM alone and 2.9% and 3.6%, respectively, using the approach combining the SCDM with the SPAM. The experimental results show our approach can be used to detect and correct contracted syllables in spontaneous speech.
中文摘要........................................................................................................ III
ABSTRACT ....................................................................................................V
誌謝...............................................................................................................VII
目錄.................................................................................................................IX
圖目錄...........................................................................................................XII
第一章 緒論............................................................................................. - 1 -
1.1 前言.................................................................................................- 1 -
1.2 研究動機與目的.............................................................................- 1 -
1.3 相關研究.........................................................................................- 7 -
1.4 研究方法簡介...............................................................................- 10 -
1.5 章節概述.......................................................................................- 11 -
第二章 系統架構................................................................................... - 13 -
2.1 訓練部份.......................................................................................- 13 -
2.2 辨識部份.......................................................................................- 15 -
第三章 音節合併候選音之挑選與偵測............................................... - 17 -
3.1 音節合併候選音的篩選...............................................................- 20 -
3.2 音節合併條件機率.......................................................................- 28 -
第四章 候選音音框狀態長度分析....................................................... - 31 -
X
4.1 候選音狀態轉移長度分析...........................................................- 31 -
4.2 音節合併誤辨音與正常音...........................................................- 33 -
4.3 音節合併音長資訊.......................................................................- 40 -
第五章 修正音節合併詞彙 及機率重新計算.................................... - 43 -
5.1 辨識器與WORD GRAPH ..............................................................- 43 -
5.2 展開音節合併現象與機率重新計算...........................................- 45 -
第六章 實驗結果與分析....................................................................... - 50 -
6.1 語料...............................................................................................- 50 -
6.2 聲學參數與模型...........................................................................- 53 -
6.3 語言模型.......................................................................................- 56 -
6.4 評量方式.......................................................................................- 58 -
6.5 基準系統(BASELINE SYSTEM) ......................................................- 59 -
6.6 結果分析.......................................................................................- 60 -
第七章 結論與未來展望....................................................................... - 74 -
7.1 結論...............................................................................................- 74 -
7.2 未來研究方向...............................................................................- 75 -
參考文獻.................................................................................................... - 76 -
附錄............................................................................................................ - 80 -
[ 1] E. Fosler-Lussier and Nelson Morgan, “Effects Of Speaking Rate And Word Frequency On Conversation Pronunciations,”Speech Communication, vol. 29, pp.137-158, 1999
[ 2] S.-C Tseng, and Y.-F. Liu, “Annotation of Mandarin Conversational Dialogue Corpus,” CKIP Technical Report, No. 02-01, Academia Sinica, 2002
[ 3] J. Berstein, G. Baldwin, W. Cohen, H. Murveit, and M Weintraub, “Phonological studies for speech recognition,” In DARPA Speech Recognition Workshop, pp. 41-48, 1992
[ 4] S.-C Tseng, “Contracted Syllables in Mandarin: Evidence from Spontaneous Conversation,” Journal of Language and Linguistics, pp. 153-180, 2005
[ 5] S.-C Tseng , “Features of Contracted Syllables of Spontaneous Mandarin,” in the Proc. of EUROSPEECH2003, pp. 77-80, 2003
[ 6] S.-C Tseng, “Syllable Contraction in a Mandarin Conversation Dialogue Corpus,” International Journal of Corpus Linguistics, pp. 63-83, 2005
[ 7] Robert L. Cheng, “Sub-syllable Morphemes in Taieanese,” Journal of Chinese Linguistics, vol. 13 1985, pp. 141-144
[ 8] Charles Li and Sandra Thompson, Mandarin Chinese: A Functional Reference Grammar, University of California Press, 1981
[ 9] Chung, Raung-Fu., Syllable contraction in Chinese. Chinese Language and Linguistics III: Morphology and Lexicon, ed. By Feng-fu Tsao and H. Samuel Wang. Taipei: Institute of History and Philology, Academia Sinica, 1997
[10] D. Jurafsky, A. Bell, M.Gregory, and W.D. Raymond, “The Effect of Language Model Probability on Pronunciation Reduction,” in the Proc. of IEEE ICASSP, pp. 801-804, 2001
[11] M.-Y. Tsai, F.-C. Chou, and L.-S. Lee, “Pronunciation Modeling With Reduced Confusion for Mandarin Chinese Using Three-Stage Framework,” IEEE Transaction on Audio, Speech and Language Processing, pp. , 2007
[12] M. Weintraub, E. Fosler, C. Galles, Y.-H. Kao, S. Khudanpur, M. Saraclar, and S. Wegmann, “WS96 project report: Automatic learning of word pronunciation from data,” presented at the JHU Workshop Pronunciaion Group, 1996.
[13] T. Holter and T. Svendsen, “Maximum likelihood modeling of pronunciation variation,” Speech Commun., vol. 29 pp. 177-191, 1999.
[14] M. Finke and A. Waibel, “Flexible transcription alignment,” in Automatic Speech Recognition and Understanding Workshop, 1997, pp. 34-40.
[15] N. Cremelie and J.-P. Martens, “In search of better pronunciation models for speech recognition,” Speech Commun., vol 29, pp. 115-136, 1999
[16] E. Fosler-Lussier, “Multi-level decision tree for static and dynamic pronunciation models,” in Eur. Conf. Speech Commun. Technol., 1999, pp. 463-466.
[17] Yi Liu, and Pascale Fung, “Pronunciation Modeling for Spontaneous Mandarin Speech Recognition,” International Journal of Speech Technology, 2004
[18] Saraclar, M., Nock, H., and Khudanpur, S. “Pronunciation modeling by sharing Gaussian densities across phonetic models.” Computer Speech and Language, 14:137–160. 2000
[19] Saraclar, M. and Khudanpur, S. “Pronunciation ambiguity vs. pronunciation variability in speech recognition.” ICASSP’00 Proceedings. Istanbul Turkey: ICASSP, pp. 1679–1682. 2000
[20] Saraclar, M. “Pronunciation modeling for conversational speech recognition.” PhD thesis, The Johns Hopkins University, Baltimore, MD, 2000
[21] L.-Y. Sun, and Y.-R. Wang, “An Analysis Modeling of Syllable Contraction in Spontaneous Mandarin Speech Recognition,” Master Thesis, Dept. of Communication Engineering, NCTU, Taiwan, 2004
[22] Y.-S. Lo, and S.-H. Chen, “An Implementation of Spontaneous Mandarin Speech Recognition Baseline System,” Master Thesis, Dept. of Communication Engineering, NCTU, Taiwan, 2005
[23] S. Ortmanns, H. Ney, and X. Aubert, “A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language, pp. 43-72, 1997
[24] F. Wessel, R. Schl er, and H. Ney, sing posterior probabilities for improved speech recognition, in Proc. IEEE Int. Conf. Aucoustic, Speech, Signal Processing 2000, Istanbul, Turkey, June 2000, pp. 1857-1590
[25] B. Rueber, btaining confidence measures from sentence probabilities, in Proc. 5th Eur, Conf. Speech Communication Technology 1997.
[26] Z.-Y. Zhou, Helen Meng, and W.-K. Lo, “A Multi-Pass Error Dection and Correction Framework for Mandarin LVCSR,” in the Proc. of IEEE ICSLP, pp. , 2006
[27] F.K. Soong, W.-K. Lo, and S. Nakamura, “Generalized Word Posterior Probability For Measuring Reliability of Recognized Word,” in the Proc. of SWIM2004, 2004
[28] Michael I. Jordan, “An Introduction to Probabilistic Graphical Models,” MIT Press, 1999
[29] X. Huang, Alex Acero, and H.-W. Hon “Spoken Language Processing” page 558 2001
[30] MAT Speech Database – TCC-300
(http://rocling.iis.sinica.edu.tw/ROCLING/MAT/Tcc_300brief.htm)
[31] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, “HTK Book,” for HTK Version 3.4, 2006
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top