臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.106) 您好！臺灣時間：2026/04/03 20:40

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

蔡如意

研究生(外文):

Ru-yi Tsai

論文名稱:

語言模型之連續性表示法於語音辨識之應用

論文名稱(外文):

Continuous Lexical Representation of Language Model for Speech Recognition

指導教授:

簡仁宗

指導教授(外文):

Jen-Tzung Chien

學位類別:

碩士

校院名稱:

國立成功大學

系所名稱:

資訊工程學系碩博士班

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2008

畢業學年度:

語文別:

中文

論文頁數:

中文關鍵詞:

潛在主題語言模型、連續型語言模型、統計式n-gram模型

外文關鍵詞:

latent semantic analysis (LSA)、topic-based probability model

相關次數:

被引用:2
點閱:405
評分:
下載:64
書目收藏:0

統計式n-gram模型是目前最普遍的語言模型之一，然而，長距離資訊的缺乏、資料稀疏和訓練與測試環境不匹配問題都將嚴重的影響n-gram語言模型的效能。由於n-gram模型的假設，當字詞間的距離超過定義的視窗大小時，其間的相依性將被忽略，因此n-gram將難以長距離的語言資訊。又由於訓練語料的不足，資料稀疏問題導致未出現的字機率為零，n-gram模型的一般化能力(Generalization)將被限制。這個問題當考慮長距離的資料或高階n-gram時將會更加嚴重。此論文著眼於建立一連續型語言模型表示法以改善語言模型所面臨的問題。傳統n-gram在離散的辭彙空間中，是利用詞彙比對求得對應機率，因此訓練語料中沒出現過的事件將會無對應之機率，導致機率為零。反之，我們利用潛在主題式資訊將離散詞彙空間轉換到低維度之連續型空間，詞彙序列將利用主題事後機率表示成一連續向量，並利用最小平方法估出最佳投影矩陣，進而在此空間中計算其語言模型機率，此時沒出現字詞將可藉由訓練語料中相似的字辭估算其最佳機率，進而達到解決資料稀疏的問題，在此連續主題空間中也將可更有效的對模型參數加以調整。又由於主題模型的建立，也將能增加模型對長距離資訊的擷取的能力。此論文也將針對n-gram模型所面臨的三大問題，探討相關文獻中提出之解決方法與此連續主題語言模型的關聯性。在實驗中，我們實作此新穎語言模型方法於華爾街日報(Wall Street Journal)大辭彙連續語音料庫，並對語言模型複雜度與語音辨識錯誤率加以分析。

Statistical n-gram language models suffer form the three weakness which are capacity of long distance information, data sparse and domain mismatch. This study presents the continuous topic language model to improve the robustness of probability prediction. Continuous representation of word sequence can effectively solve data sparseness problem in n-gram language model, where the discrete variables of words are represented and the unseen events are prone to happen. This problem is increasingly severe when extracting long-distance regularities for high-order n-gram model. Rather than considering discrete word space, we construct the continuous space to represent word sequence where the latent topic information is embedded. The continuous vector is formed by the topic posterior probabilities .The least squares projection matrix from discrete word space to continuous topic space is accordingly estimated. The unseen words can be predicted through the new continuous representation of language model. Also, performing language model adaptation in continuous topic space can increase the robustness of the model. Word distribution of an unseen history in the adaptation data can be estimated through considering the neighboring histories in the continuous topic space. Also, using topic framework makes it feasible to exploit long distance regularities. In the experiments, we implement the proposed method on using Wall Street Journal corpus and obtain the significant performance improvement over the conventional latent topic language model.

中文摘要: i
Abstract: ii
章節目次 iii
圖目錄 v
表目錄 vi
第一章緒論 1
第二章語音辨識簡介 5
2.1 語音辨識概論 5
2.2 語言模型 7
2.3 語言模型之評估 13
2.4 N-gram模型之缺點 13
第三章 N-gram語言模型改進方向 15
3.1 長距離語言模型 15
3.2 語言模型平滑化方法 21
3.2.1 回退式平滑化 (Backoff Smoothing) 22
3.2.2 類別語言模型(Class-based language models) 24
3.2.3 連續空間語言模型 24
第四章語言模型之連續型辭彙表示法 28
4.1 連續辭彙表示法 28
4.1.1 潛在主題語言模型 30
4.1.2 連續空間之建立 31
4.1.3 階層式連續主題空間 32
4.2 長距離資訊擷取之討論 35
4.3 資料稀疏問題之討論 36
4.4 語言模型調整之討論 37
第五章實驗 40
5.1 語料庫簡介 40
5.2 實驗設定 42
5.3 實驗結果 44
5.3.1 潛在主題模型之分析 44
5.3.2 Perplexity分析 47
5.3.3 語音辨識效能分析 50
5.4 系統展示 53
第六章結論與未來研究方向 57
參考文獻 59

[1]M. Afify, O. Siohan and R. Sarikaya, “Gaussian mixture language models for speech recognition”, in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 4, pp. 29-32, 2007.
[2]J. Bellegarda, “Exploiting latent semantic information in statistical language modeling”, Proceedings of the IEEE, vol. 88, no. 8, pp. 1279-1296, 2000.
[3]J. Bellegarda, “Statistical language model adaptation: review and perspectives,” Speech Communication, vol. 42, pp. 93-108, 2004.
[4]Y. Bengio, R. Ducharme, P. Vincent and C. Jauvin, “A neural probabilistic language model”, Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.
[5]M. Berry, S. Durmais and G. Obrien, “Using linear algebra for intelligent information retrieval,” SIAM Review, vol. 37, pp. 573-595, 1995.
[6]D. Blei, A, Ng and M. Jordan, “ Latent Dirichlet allocation”, Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[7]P. Brown, J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek, J. Lafferty, R. Mercer and P. Roossin, ”A statistical approach to machine translation,” Computational Linguistics, vol. 16, pp. 79-85, 1990.
[8]P. Brown, V. Della Pietra, P. De Souza, J. Lai and R. Mercer, “Class-based n-gram models of natural language,” Computational Linguistics, vo. 18, no. 4, pp. 467-479, 1992.
[9]C. Chelba and F. Jelinek, “Structured language modeling,” Computer Speech and Language, vol. 14, no. 4, pp. 283-332, 2000.
[10]S. F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” Computer Speech and Language, vo. 13, pp. 359-394, 1999.
[11]J.-T. Chien, “Association pattern language modeling”, IEEE Trans. Audio, Speech and Language Processing, vol. 14, no. 5, pp. 1719-1728, 2006.
[12]J.-T. Chien, M.-S. Wu and H.-J. Peng, “On latent semantic language modeling and smoothing”, in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 2, pp. 1373-1376, 2004.
[13]C.-H. Chueh, J.-T. Chien and H. Wang, “A maximum entropy approach for integrating semantic information in statistical language models”, in Proc. Internal Symposium on Chinese Spoken Language Processing, pp. 309-312, 2004.
[14]S. Deerwester, S. Dumais, G. Furnas, T. Landauer and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American Society of Information Science, vol. 41, pp. 391-407, 1990.
[15]S. Della Pietra, V. Della Pietra R. Bercer and S. Roukos, “Adactive language modeling using minimum discriminant estimation,” in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 1, pp. 633-636, 1992.
[16]A. Dempster, N. Laird and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, vol. 39, pp. 1-38, 1977.
[17]A. Emami, P, Xu and F. Jelinek, “Using a connectionist model in a syntactical based language model,” in Proc. International Conference on Acoustic, Speech and Signal Processing, pp. 372-375, 2003.
[18]M. Federico, “Bayesian estimation methods for n-gram language model adaptation,” in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 1, pp. 240-243, 1996.
[19]M. Federico, “Efficient language model adaptation through MDI estimation,” in Proc. Eurospeech, pp. 1583-1586, 1999.
[20]R. Florian and D. Yarowsky, “Dynamic nonlocal language model adaptation via hierarchical topic-based adaptation,” in Proc. ACL, pp. 167-174, 1999.
[21]D. Gildea and T. Hofmann, “Topic-based language models using EM”, in Proc. Eurospeech, pp. 2167-2170, 1999.
[22]I. J. Goodman, “The population frequencies of species and the estimation of population parameters,” Biometrika, vol. 40, pp. 237-264, 1953.
[23]Hidden Markov Model Toolkit (HTK), http://htk.eng.cam.ac.uk/.
[24]T. Hofmann, “Probabilistic latent semantic indexing”, in Proc. ACM SIGIR, pp. 50-57, 1999.
[25]X. Huang, A. Acero and H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall PTR, 2001
[26]F. Jelinek and R. Mercer, “Interpolated estimation of Markov source parameters from sparse data,” in Proc. Workshop in Pattern Recognition in Practice, pp. 381-402, 1980.
[27]S. M. Katz, “Estimation of probabilities from sparse data for the language model component of a speech recognizer,” IEEE Trans. Acoustic, Speech and Signal Processing, vol. 35, pp. 400-401, 1987.
[28]S. Khudanour and J. Wu, “Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling,” Computer Speech and Language, vol. 14, pp. 355-372, 2000.
[29]R. Kneser and H. Ney, “Improved backing-off for m-gram language modeling”, in Proc. International Conference on Acoustic, Speech and Signal Processing, pp. 181-184, 1995.
[30]L. Lamel, R. Kassel, and S. Seneff, “Speech database development: design and analysis of the acoustic-phonetic corpus,“ in Proc. of the DARPA Speech Recognition Workshop, pp. 100-109, 1986.
[31]C. Leggeter and P. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, vol. 9, pp. 171-185, 1995.
[32]G. Lidstone, “Note on general case of the Bayes-Laplace formula for inductive or posteriori probabilities,” Transition of the Faculty of Actuaries, vol. 8, pp. 182-192, 1920.
[33]H. Ney, U. Essen and R. Kneser, “On structuring probabilistic dependencies in stochastic language modeling,” Computer Speech and Language, vol. 8, pp. 1-38, 1994.
[34]D. Paul and J. Baker, “The design for Wall Street Journal based CSR corpus”, in Proc. International Conference on Spoken Language Processing, pp. 899-902, 1992.
[35]J. Ponte and W. Croft, “A language modeling approach to information retrieval,” in Proc. SIGIR on Research and Development in Information Retrieval, pp. 275-281, 1998.
[36]R. Rosenfeld, “A maximum entropy approach to adaptive statistical language modeling,” Computer Speech and Language, vol. 10, pp. 187-228, 1996.
[37]H. Schwenk, “Continuous space language models”, Computer Speech and Language, vol. 21, pp. 492-518, 2007.
[38]H. Schwenk and J. Gauvain, “Connectionist language modeling for large vocabulary speech recognition,” in Proc. International Conference on Acoustic, Speech and Signal Processing, pp. 765-768, 2002.
[39]Y. Tam and T. Schultz, “Correlated latent semantic model for unsupervised LM adaptation”, in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 4, pp. 41-44, 2007.
[40]K. Vertanen, “Baseline WSJ acoustic models for HTK and sphinx: training recipes and recognition experiments”, Technical Report, Cavendish Laboratory, 2006.
[41]H. Wang and T. Kawahara, “PLSA-Based Topic Detection in Meetings for Adaptation of Lexicon and Language,” in Proc. Interspeech, pp. 602-608, 2007.
[42]P. Woodland, J. Odell, V. Valtchev and S. Young, “Large vocabulary speech recognition using HTK,” in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 2, pp. 125-128, 1994.
[43]J. Wu and S. Khudanpur, “Building a topic-dependent maximum entropy model for very large corpora,” in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 1, pp. 777-780, 2002.
[44]G. Zhou and K. Liu, “Interpolation of n-gram and mutual-information based trigger pair language models for Mandarin speech recognition,” Computer Speech and Language, vol. 13, pp. 125-141, 1999.
[45]I. Zitouni, “Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition,” Computer Speech and Language, vol. 21, no. 1, pp. 88-104, 2007.

電子全文

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	應用詞彙量化及潛藏語意分析於口語敘述檢索醫療文件之研究
2.	語音辨識技術在數位科技玩具之應用與研究

無相關期刊

1.	多視點立體顯示器之影像合成技術
2.	語音驅動多媒體互動系統結合泛在聲音辨識於數位家庭之應用
3.	即時定位自動檢測系統：在中小尺寸TFT-LCD上的應用
4.	家庭獨立性調節害羞氣質對兒童內化問題的影響：六年追蹤研究
5.	臺灣地區飲水含砷與前列腺癌及膀胱癌的相關性
6.	TFT-LCD自動化光學偵測：在COG/COF導電粒子強度量測上之應用
7.	太陽能電池薄片分離與數量檢測系統:利用投影量累計統計圖法
8.	在無線感測網路中採用具優先權提早丟棄機制之事件到目的端可靠的傳輸協定
9.	自適性隱藏式馬可夫模型拓撲於語音辨識之應用
10.	全球化效應對都市房地產價格影響之研究：以中國大型都市為例
11.	AcarviosylTransferase之結構與酵素活性研究
12.	應用類神經模式推估未設測站之自然流態
13.	以化學氣相沉積法於LiAlO2基板上成長具鐵磁特性之非極性Zn1-xCoxO磊晶薄膜
14.	因素分析模型於語音辨識之研究
15.	鋼琴演奏之手部生物力學分析：探討手部大小及左右手之差異性

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室