跳到主要內容

臺灣博碩士論文加值系統

(54.83.119.159) 您好!臺灣時間:2022/01/17 10:13
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林俊宇
研究生(外文):Chun-Yu Lin
論文名稱:應用隱含式語意索引與語言模型於中英夾雜語音之語言鑑別
論文名稱(外文):Language Identification of Language-Mixed Speech Using Latent Semantic Indexing and Language Model
指導教授:吳宗憲吳宗憲引用關係
指導教授(外文):Chung-Hsien Wu
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:64
中文關鍵詞:語言鑑別隱含式語意索引語言模型
外文關鍵詞:Language IdentificationLanguage ModelLatent Semantic Indexing
相關次數:
  • 被引用被引用:1
  • 點閱點閱:538
  • 評分評分:
  • 下載下載:97
  • 收藏至我的研究室書目清單書目收藏:1
隨著全球資訊的交流與通訊的便利,具備處理多國語言能力之人機介面愈顯重要;面對不同的語言甚至語言夾雜的情形,對話應用系統必須要能夠判定使用者話語中所使用的語言,才能進一步作語音的辨認;目前的語言辨識的研究多著重於單一語言語句之語言辨識上,架構上大致可分為高斯混合模型,單一語言音素辨識或平行語言音素辨識之語言模型等。
在本文,提出富彈性且有效率的前端偵測機制以處理單一語句中語言夾雜的問題;我們的研究著重在下列幾項 1)採用貝式資訊準則根據聲學特性的變異關係將語音分割成不同的段落2)針對不同情形之段落,將鑑別性參數採用隱含式語意索引概念,個別予以訓練高斯混合模型3)整合向量量化之雙連語言模型以強化段落化語言的鑑定4)最後,應用線性濾波器概念及動態規劃程式分別針對整體語音段作平滑化的動作並進一步偵測語言邊界點。
在實驗中,共有5304句中英夾雜語句(3人),3~5秒中文和英文句各500句(5人, Database 1),約15秒之中文和英文句各250句(5人, Database 2)被收集,其中80%為訓練語料,剩餘20%為測試語料;實驗結果顯示在語言夾雜情況下,段落化語言鑑別率達74%,語言邊界偵測F值則達0.62;在單一語言鑑別上,Database 1 和 Database 2 分別達到0.79和0.90的鑑別率,與其他的方法評比,在鑑別率上,本文方法有明顯之提昇。
With the trend of globalizing information exchangeability and communication, human machine interface with multi-lingual processing ability to distinguish between languages and provide inter-connective services become increasingly important. In the multi-lingual spoken language and dialog applications, the problem of multiple language or mixed language input is crucial for speech recognition. Recent researches into automatic language identification (LID) and recognition have been addressed to keep up with the growing demand from the application side. These approaches had more emphasis on the task of determining the language in which a single utterance was spoken and can be categorized from a framework viewpoint towards building the language dependent or independent recognizer, such as Gaussian mixture modeling, single language phone or parallel phone recognition followed by language modeling, etc.
In this paper, a flexible and efficient front-end architecture for language identification was proposed for speech segmentation and detection with mixed LID in a single utterance. More specially, this study focuses on: 1) adopting the Bayesian information criteria (BIC) with language-dependent acoustic features to divide input utterance into several acoustically-associated segments, 2) proposing a feature-discriminative and language dependent GMM using Latent Semantic Indexing approach to measure the strength of language for each segment, 3) integrating a VQ-based bi-gram language model into an MAP-based decision mechanism for language identification and 4) finally, applying a linear filtering and dynamic programming approaches for the precise language boundary estimation and smoothing.
In order to evaluate our proposed approach, 5304 Mandarin-English mixed speech corpus (3 male speakers), 500 single language utterances with the duration of 3~5 seconds (Database 1), and 250 single language utterances with the duration of 15 seconds (Database 2) are collected. 80% corpus are used as the training database, 20% corpus are used as the testing database. Experimental results showed that the proposed mixed language decision mechanism achieved 74% accuracy and F value for the language boundary detection was 0.62. The LID rate for Database 1 and Database 2 were 0.79 and 0.90, respectively. Our proposed architecture outperforms than other well-established approaches. This study aims for multi-lingual speech recognition.
中文摘要
目錄
圖表目錄
第一章 緒論 1
1.1 前言 1
1.2 研究動機與目的 3
1.3 文獻回顧與相關技術之探討 5
1.2.1 音素辨識語言模型(Phone Recognition Language Model,PRLM) 7
1.2.2 向量量化法(Vector Quantization) 11
1.2.3 高斯混合模型(Gaussian Mixture Model) 12
1.4 研究方法簡介 15
1.5 章節概要 18
第二章 中英夾雜語音之語言鑑別系統 19
2.1 語音預切割及段落化之處理 20
2.2 語言雙連模型之建構 24
2.3 段落索引向量化及高斯混合語言模型之建立 25
2.4 語言計分及決策模型之建立 31
2.5 聲段的平滑化及後置處理 33
2.6 中英夾雜音段之細分 34
第三章 語料庫設計與實驗結果 36
3.1 語料庫設計 36
3.1.1設計方法 36
3.2 實驗結果與討論 41
3.2.1 實驗設定 41
3.2.2 利用貝氏資訊準則於聲音邊界偵測之探討 42
3.2.3 LSI-based GMM對語言鑑別的效率之探討 44
3.2.4 Bi-gram語言模組對語言鑑別的效率之探討 46
3.2.5 整合LSI模組和Bi-gram語言模組對語言鑑別的效率之探討 48
3.2.6 單一語言鑑別率之探討 50
第四章 結論與未來展望 54
4.1 結論 54
4.2 未來展望 54
參考文獻 56
附錄一 中文SAMPA_C附表 60
附錄二 英文SAMPA附表 61
附錄三 中英文夾雜句的語料庫 62
[1]Marc A. Zissman, “Comparison of Four Approaches to Automatic Language Identification of Telephone Speech ,” IEEE Trans. On Speech and Audio Proc., vo4. no1, pp. 31-43, January 1996.
[2]T. J. Hazen and V. W. Zue, “Automatic language identification using a segment-based approach,” in Proc. Eurospeech ’93, vol. 2, pp. 1303-1306, Sept. 1993.
[3]M. A. Zissman and E. Singer, “Automatic language identification of telephone speech messages using phoneme recognition and n-gram modeling,” in Proc. ICASSP ’94, vol. 1, pp. 305-308, Apr. 1994.
[4]R. C. F. Tucker, M. J. Carey, and E. S. Paris, “Automatic language identification using sub-words models,” in Proc. ICASSP ’94, vol. 1, pp. 301-304, Apr. 1994.
[5]Francois Pellegrino, Regine Andre-Obrecht, “Automatic language identification : an alternative approach to phonetic modeling,” Signal Processing, vol. 80, issue 7, pp. 1231-1244, July, 2000
[6]Wuei-He Tsai, Wen-Whei Chang ,“ Discriminative training of Gaussian mixture bigram models with application to Chinese dialect identification,” Speech Communication 36, pp. 317-326, 2002.
[7]M. Sugiyama, “Automatic language recognition using acoustic features,” in Proc. ICASSP ’91, vol. 2, pp. 813-816, 1991.
[8]M. A. Zissman, “Automatic language identification using Gaussian mixture and hidden Markov models, ”in Proc. ICASSP ’93, vol.2, pp.399-402, Apr. 1993.
[9]L. Riek. W.Mistreta, and D. Morgan, “Experiments in language identification, ” Lockheed Sanders, Inc., Nashua, NH, Tech. Rep. SPCOT91-002, Dec. 1991.
[10]S. Nakagawa, Y. Ueda, and T. Seino, “Speaker-independent, text-independent language identification by HMM, ”in Proc. ICSLP ’92, vol. 2, pp.1011-1014, Oct. 1992.
[11]R. J. D’Amore and C. P. Mah, “One-time complete indexing of text: Theory and practice,” in Proc. Eighth Int. ACM Conf. Res. Dev. Inform. Retrieval, pp. 155-164, 1985.
[12]R. E. Kimbrell, “Searching for text Send an N-gram,” Byte, vol. 13, no. 5, pp. 297-312, 1988.
[13]J. C. Shmitt, “Trigram-based method of language identification, ” US Patent 5 062 143, Oct. 1991.
[14]M. Damashek, “Gauging similarity via N-grams: Language- independent text sorting, categorization, and retrieval of text,” submitted for publication in Sci.
[15]Y. Yan and E. Barnard, “An approach to automatic language identification based on language-dependent phone recognition,” in Proc. ICASSP ’95, vol. 5, pp. 3511-3514, May, 1995.
[16]T. J. Hazen and V. W. Zue, “Recent improvements in an approach to segment-based automatic language identification,” in Proc. ICASSP ’94, vol. 4, pp. 1883-1886, Sep. 1994.
[17]Huang, X.D., Y. Ariki, and M.A. Jack, “Hidden Markov Models for Speech Recognition,” Edinburgh, UK., Edinburgh University Press. 1990.
[18]Rolf Johansson, “System Modeling and identification,” Prentice Hall, pp. 192, 1993.
[19]H. Akaike, “A new look at the statistical model identification,” TAC-19, pp. 718-723, 1977.
[20]R. Shibata, “Asymptotically efficient selection of the order of a model for estimating parameters of a linear process,” Ann. Statistics, vol. 8, pp. 147-164, 1980.
[21]G. Schwartz, “Estimating the dimension of a model,” Ann. Statistics, vol. 6, pp. 461-464, 1978
[22]J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, pp. 465-471, 1978.
[23]D. Burshtein and E. Weinstein, “On the application of the Wald statistic to order estimation of ARMA models.” TAC-36, pp. 1091-1096, 1992.
[24]Mauro Cettolo and Marcello Federico,“ Model selection criteria for acoustic segmentation,” in Proc. Of the ISCA ITRW ASR2000 Automatic Speech Recognition, pp. 221-227, 2000.
[25]U. Iurgel, R. Meermeier, S. Eickeler, G.Rigoll, “ New approaches to audio-visual segmentation of TV news for automatic retrieval,” in Acoustics, Speech, and Signal Processing, 2001. Proceedings. 2001 IEEE International Conference on , vol. 3 , 2001.
[26]Alain Trischler and Ramesh Gopinath, “ Improved speaker segmentation and segments clustering using the Bayesian information criterion,” in Proc. EUROSPEECH, vol.2, pp.679-682, 1999.
[27]S. Chen, P. Gopalakrishnan, “Speaker environment and channel change detection and clustering via the Bayesian Information Criterion,” Proc. of the DARPA Workshop, 1998.
[28]G. Schwarz, “Estimating the dimension of a model,” The annals of statistics, vol. 6, pp. 461-464, 1978.
[29]Dian I. Witter, Michael W. Berry, “Downdating the Latent Semantic Indexing Model for Conceptual Information Retrieval,” The Computer Journal, vol. 41, no. 8, pp. 589-601, 1998.
[30]Berry, M. W. , “Large scale singular value computations,” Int. J. Supercomput. Applic., vol. 6, pp. 13-49, 1992.
[31]Ricardo Baeza-Yates, Berthier Ribeiro-Neto, “Modern Information Retrieval,” Addison-Wesley, pp.86, 1999.
[32]T.L. Lander et al., The OGI 22 language telephone speech corpus, Proc. Eurospeech’95, Madrid, pp.817-820, 1995.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top