跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.173) 您好!臺灣時間:2024/12/10 10:03
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳淼海
研究生(外文):Miao-Hai Chen
論文名稱:基於盲訊號分離語音增強技術之遠距離雜訊語音辨識
論文名稱(外文):Speech Enhancement Technique Based on Blind Source Separation for Far-Field Noisy Speech Recognition
指導教授:王駿發
指導教授(外文):Jhing-Fa Wang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電機工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:英文
論文頁數:65
中文關鍵詞:語音辨識麥克風陣列子空間語音增強盲訊號分離法
外文關鍵詞:microphone arrayspeech recognitionblind source separationsubspace speech enhancement
相關次數:
  • 被引用被引用:2
  • 點閱點閱:436
  • 評分評分:
  • 下載下載:43
  • 收藏至我的研究室書目清單書目收藏:1
語音是人類彼此溝通時,最原始也是最有效的方式。隨著科技發展,我們藉由電腦幫我們完成許多複雜的運算與應用。此時如何讓電腦可以瞭解人類的語言,也成為語音處理上重要環節之一。由於語音辨識技術逐漸提昇的情況下,如何更有效率與更精確的辨認我們所表達的語音資訊,將是我們最主要的議題。
綜觀目前市面上常見的語音辨識系統,根據使用環境,主要可區分為近距離手持麥克風與遠距離免持裝置。近距離的手持麥克風,辨識率一般均可達到相當滿意的成效,因其較不易受到其他的干擾而導致辨識錯誤;而遠距離的收音環境,通常容易因為噪聲與回聲的影響,導致辨識率不彰。依據辨識核心,則又可以分成語者相關或語者獨立的辨識系統,語者相關的系統,通常在使用前需要有訓練的過程,以建立特定語者的聲學模型,語者獨立的系統則無須此過程。以便利的使用性而言,使用遠距離免持裝置搭配語者獨立的語音辨識器,是最容易被廣為接受的系統。
遠距離的免持裝置,一般透過所佈置的麥克風陣列進行收音,主要可用來改善並增強語音的品質,或彌補單通道麥克風收音的不足。本論文中我們使用了兩隻麥克風進行收音,透過盲訊號分離法來取出相近似語音成份較多的部分,接著透過子空間語音增強的方式,將取出的語音再進一步去除殘餘的噪聲,使其可以用來進行語音辨識之用。而末端的語音辨識器,我們則使用了劍橋大學所提供的HTK語音套件進行識別,並判斷所產生的結果是否正確。實驗結果顯示,我們所提出的系統,可以適用在各種噪聲的環境當中,並且可以有效的改善辨識率20%以上,以及將原本信噪比0~10dB的帶噪語音提昇至20dB以上。
Speech is the most primitive and efficient method for human communication. With the development of science and technology, we use computer to accomplish many complex operations and applications. Therefore, how to make computers to understand the human language is one of the most important parts in speech processing. As speech recognition is the first step for language understanding, it is our principal issue to recognize speech information more efficiently and accurately.
In general, speech recognition systems can be classified into two types according to the environment of use. The first type is near filed handheld microphone and the other is far filed hands-free device. For the near filed handheld microphone, it can achieve better performance in recognition rate because of its low interference from environment; for the far filed hands-free device, the noise and reverberation will easily cause the decrease of recognition rate.
Besides, the speech recognition algorithms can be classified into speaker-dependent and speaker-independent ones. Speaker-dependent speech recognition needs a training procedure to construct the corresponding acoustical model for each speaker. For the speaker-independent speech recognition, a generalized acoustical model is trained so that no further training is required for use. Considering the convenience for users, it is well accepted to integrate the far field hands-free devices with the speaker-independent speech recognition algorithm.
The far field hands-free device usually collects the sounds through the microphone array. A microphone array can enhance the speech quality or make up for the lack of single-channel microphone. In this thesis, we use two microphones to collect the sounds, and extract target speech by blind source separation (BSS). The residue noise is then removed by the subspace enhancement method. Finally, the enhanced speech is recognized by a speech recognition system. In this thesis, we construct the speech recognition system via HTK toolkit developed by University of Cambridge. The experimental results show that the proposed system is suitable for several presented noisy environments, and it effectively improves the recognition rate by 20%. For the SNR evaluation, the proposed system can make enhanced speech SNR have 20 dB higher than original corrupted speech which was ranged from 0dB to 10dB.
摘要 I
Abstract III
致謝 V
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Thesis Objectives 2
1.3 Thesis Organization 3
Chapter 2 Review and Related Works 4
2.1 Microphone Array 4
2.1.1 Characteristics of Microphone Array 5
2.1.2 Speech recognition Using Microphone Array 10
2.2 Blind Source Separation 12
2.2.1 The Problems Stated with BSS 12
2.2.2 Common Algorithms for BSS 13
2.3 Independent Component Analysis 14
2.3.1 Basic Concept & Theory 14
2.3.2 Central Limit Theorem 19
2.3.3 Pre-processing of ICA 20
2.3.4 Related Applications 22
Chapter 3 Framework of proposed system 24
3.1 Overview of Proposed System 24
3.2 Introduction to FastICA 25
3.2.1 Objective Function 27
3.2.2 Optimal Method 30
3.2.3 Process Flow Structure 32
3.2.4 Problems for Practical Use 33
3.3 The Proposed System Based on FastICA 34
3.3.1 Structure of the system 35
3.3.2 Signal Subspace Speech Enhancement 36
3.3.3 Voice Activity Detection 39
3.4 Automatic Speech Recognition System 42
3.4.1 HTK toolkit 43
3.4.2 Feature Extraction from Speech 44
3.4.3 Speech Recognition via HTK 46
Chapter 4 Experimental Design and Results 48
4.1 Experimental Equipments 48
4.1.1 Microphone Array 48
4.1.2 Pre-Amplifier Circuit 50
4.2 Experimental Setup 51
4.2.1 Environment Setting 51
4.2.2 Experimental Training Corpora 53
4.3 Experimental Results 54
4.3.1 Interface of Our System 54
4.3.2 Speech-to-Noise Ratio 55
4.3.3 Speech Recognition Rate 58
Chapter 5 Conclusion and Future Works 62
References 63
[1] A. Hyvärinen. “Fast and Robust Fixed-Point Algorithms for Independent Component Analysis.” IEEE Transactions on Neural Networks ,Vol.10, No.3, pp.626-634, 1999.
[2] B.N. Gover, J.G. Ryan, and M.R. Stinson, “Microphone array measurement system for analysis of directional and spatial variations of sound fields,” J. Acoust. Soc. Am., 112, 1980–1991 (2002).
[3] B.N. Gover, J.G. Ryan, and M.R. Stinson, “Measurements of directional properties of reverberant sound fields in rooms using a spherical microphone array,” J. Acoust. Soc. Am. (in press).
[4] Leukimmiatis, S., Dimitriadis, D., and Maragos, P.: ‘An optimum microphone array post-filter for speech applications’. ICSLP, 2006, pp. 2142–2145
[5] http://en.wikipedia.org/wiki/Colin_Cherry
[6] Yan Li, P. Wen and D. Powers, “Methods for the blind signal separation problem,” in Proc. IEEE Int. Conf. Neural Network, Signal Processing, Nanjing China, Dec. 2003, pp. 1386-1389.
[7] J. Herault and C. lutten, “Space or time adaptive signal processing by neural network models”, In J. S. Denkcr (ed), editor, Neural Nehvorks For Computing: AIP Conference Proceedings 151, American Institute for Physics, New York, 1986.
[8] G. Burel, “Blind separation of sources ~ a nonlinear neural algorithm”, Neural Nehvorkr, Vol. 5, No, 6, pp. 937-947, 1992.
[9] A. J. Bell and T. J. Sejnowski, “An information-maximisation approach to blind separation and blind deconvolution”, Neural Computation, Vol. 7, No. 6, 1004-1034, 1995.
[10] P. Smaragdis, Information theoretic Approaches to source separation, Master’s Thesis, MIT, Cambridge, MA, 1997.
[11] I. Lin, D. Grier, and J. Cowan, “Faithful representation of separable distributions”, Neural Computation, Vol. 9, pp. 1305-1320,1997.
[12] F. Tordini and F. Piazza, “A semi-blind approach to the separation of real world speech mixtures” , in IJCNN'02, Vol. 2, 2002, pp. 1293–1298.
[13] Aapo Hyvärinen, “Independent Component Analysis,” John Wiley, 2001.
[14] Roger L.berger, George Casella ,“Statistical Inference "second edition , DUXBURY 2002.
[15] T.M. Cover and J.A. Thomas, “Elements of Information Theory,” Wiley, 1991
[16] Aapo Hyvärinen, “New approximations of differential entropy for independent component analysis and projection pursuit,” Advance Neural Inform. Processing Syst. 10. MIT Press, pp.273-279, 1998
[17]Ephraim, Y. and Van Trees, H. L.: A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing. vol. 3, no. 4, pp. 251–266, July 1995
[18] A. Rezayee and S. Gazor, “An adaptive KLT approach for speech enhancement,” IEEE Trans. Speech Audio Processing, vol. 9, pp. 87-95, Feb. 2001.
[19] K. Hermus, P. Wambacq, and H.V. Hamme, “A review of signal subspace speech enhancement and its application to noise robust speech recognition,” EURASIP Journal on Advances in Signal Processing, vol. 2007, pp. Article ID 45821, 15 pages, 2007.
[20] J. Ramírez, J. M. Gorriz, and J. C. Segura (2007) “Voice activity detection. Fundamentals and speech recognition system robustness” In M. Grimm and K. Kroschel, editors, Robust Speech Recognition and Understanding, I-Tech, 2007.
[21] Jaber Marvan, “Voice Activity detection Method and Apparatus for voiced/unvoiced decision and Pitch Estimation in a Noisy speech feature extraction”, 08/23/2007, United States Patent 20070198251.
[22] Rabiner, L. R., and Schafer, R. W., Digital Processing of Speech Signals, Englewood Cliffs, New Jersey, Prentice Hall, 512-ISBN-13:9780132136037, 1978.
[23] Young, S. et al. HTKbook (V3.4), Cambridge University Engineering Dept. (2006)
[24] Young, S., Everman, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland P. (2001) The HTK Book 3.1. Cambridge: Entropic.
[25] Taylor, P., King, S., Isard, S. and Wright, H. (1998) Intonation and Dialog Context as Constraints for Speech Recognition. In: Language and Speech, vol.41 (3-4), pp.493-512.
[26] A. Varga, H.J.M Steenneken, M. Tomlinson and D. Jones. The NOISEX-92 study on the effect of additive noise on automatic speech recognition, 1992. Documentation included in the NOISEX-92 CD-ROMs.
[27] R. Kuhn, F. Perronnin, P. Nguyen, J.-C. Junqua, and L. Rigazio, “Very fast adaptation with a compact context-dependent eigenvoice model,” in Proc. ICASSP, May 2001, vol. 1, pp. 373–376.
[28] C. Y. Tseng, ”A phonetically oriented speech database for Mandarin Chinese,” Proc. ICPhS95, Stockholm, pp.326-329, 1995
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊