

( 您好!臺灣時間:2024/12/10 10:03
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


研究生(外文):Miao-Hai Chen
論文名稱(外文):Speech Enhancement Technique Based on Blind Source Separation for Far-Field Noisy Speech Recognition
指導教授(外文):Jhing-Fa Wang
外文關鍵詞:microphone arrayspeech recognitionblind source separationsubspace speech enhancement
  • 被引用被引用:2
  • 點閱點閱:436
  • 評分評分:
  • 下載下載:43
  • 收藏至我的研究室書目清單書目收藏:1
Speech is the most primitive and efficient method for human communication. With the development of science and technology, we use computer to accomplish many complex operations and applications. Therefore, how to make computers to understand the human language is one of the most important parts in speech processing. As speech recognition is the first step for language understanding, it is our principal issue to recognize speech information more efficiently and accurately.
In general, speech recognition systems can be classified into two types according to the environment of use. The first type is near filed handheld microphone and the other is far filed hands-free device. For the near filed handheld microphone, it can achieve better performance in recognition rate because of its low interference from environment; for the far filed hands-free device, the noise and reverberation will easily cause the decrease of recognition rate.
Besides, the speech recognition algorithms can be classified into speaker-dependent and speaker-independent ones. Speaker-dependent speech recognition needs a training procedure to construct the corresponding acoustical model for each speaker. For the speaker-independent speech recognition, a generalized acoustical model is trained so that no further training is required for use. Considering the convenience for users, it is well accepted to integrate the far field hands-free devices with the speaker-independent speech recognition algorithm.
The far field hands-free device usually collects the sounds through the microphone array. A microphone array can enhance the speech quality or make up for the lack of single-channel microphone. In this thesis, we use two microphones to collect the sounds, and extract target speech by blind source separation (BSS). The residue noise is then removed by the subspace enhancement method. Finally, the enhanced speech is recognized by a speech recognition system. In this thesis, we construct the speech recognition system via HTK toolkit developed by University of Cambridge. The experimental results show that the proposed system is suitable for several presented noisy environments, and it effectively improves the recognition rate by 20%. For the SNR evaluation, the proposed system can make enhanced speech SNR have 20 dB higher than original corrupted speech which was ranged from 0dB to 10dB.
摘要 I
Abstract III
致謝 V
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Thesis Objectives 2
1.3 Thesis Organization 3
Chapter 2 Review and Related Works 4
2.1 Microphone Array 4
2.1.1 Characteristics of Microphone Array 5
2.1.2 Speech recognition Using Microphone Array 10
2.2 Blind Source Separation 12
2.2.1 The Problems Stated with BSS 12
2.2.2 Common Algorithms for BSS 13
2.3 Independent Component Analysis 14
2.3.1 Basic Concept & Theory 14
2.3.2 Central Limit Theorem 19
2.3.3 Pre-processing of ICA 20
2.3.4 Related Applications 22
Chapter 3 Framework of proposed system 24
3.1 Overview of Proposed System 24
3.2 Introduction to FastICA 25
3.2.1 Objective Function 27
3.2.2 Optimal Method 30
3.2.3 Process Flow Structure 32
3.2.4 Problems for Practical Use 33
3.3 The Proposed System Based on FastICA 34
3.3.1 Structure of the system 35
3.3.2 Signal Subspace Speech Enhancement 36
3.3.3 Voice Activity Detection 39
3.4 Automatic Speech Recognition System 42
3.4.1 HTK toolkit 43
3.4.2 Feature Extraction from Speech 44
3.4.3 Speech Recognition via HTK 46
Chapter 4 Experimental Design and Results 48
4.1 Experimental Equipments 48
4.1.1 Microphone Array 48
4.1.2 Pre-Amplifier Circuit 50
4.2 Experimental Setup 51
4.2.1 Environment Setting 51
4.2.2 Experimental Training Corpora 53
4.3 Experimental Results 54
4.3.1 Interface of Our System 54
4.3.2 Speech-to-Noise Ratio 55
4.3.3 Speech Recognition Rate 58
Chapter 5 Conclusion and Future Works 62
References 63
[1] A. Hyvärinen. “Fast and Robust Fixed-Point Algorithms for Independent Component Analysis.” IEEE Transactions on Neural Networks ,Vol.10, No.3, pp.626-634, 1999.
[2] B.N. Gover, J.G. Ryan, and M.R. Stinson, “Microphone array measurement system for analysis of directional and spatial variations of sound fields,” J. Acoust. Soc. Am., 112, 1980–1991 (2002).
[3] B.N. Gover, J.G. Ryan, and M.R. Stinson, “Measurements of directional properties of reverberant sound fields in rooms using a spherical microphone array,” J. Acoust. Soc. Am. (in press).
[4] Leukimmiatis, S., Dimitriadis, D., and Maragos, P.: ‘An optimum microphone array post-filter for speech applications’. ICSLP, 2006, pp. 2142–2145
[5] http://en.wikipedia.org/wiki/Colin_Cherry
[6] Yan Li, P. Wen and D. Powers, “Methods for the blind signal separation problem,” in Proc. IEEE Int. Conf. Neural Network, Signal Processing, Nanjing China, Dec. 2003, pp. 1386-1389.
[7] J. Herault and C. lutten, “Space or time adaptive signal processing by neural network models”, In J. S. Denkcr (ed), editor, Neural Nehvorks For Computing: AIP Conference Proceedings 151, American Institute for Physics, New York, 1986.
[8] G. Burel, “Blind separation of sources ~ a nonlinear neural algorithm”, Neural Nehvorkr, Vol. 5, No, 6, pp. 937-947, 1992.
[9] A. J. Bell and T. J. Sejnowski, “An information-maximisation approach to blind separation and blind deconvolution”, Neural Computation, Vol. 7, No. 6, 1004-1034, 1995.
[10] P. Smaragdis, Information theoretic Approaches to source separation, Master’s Thesis, MIT, Cambridge, MA, 1997.
[11] I. Lin, D. Grier, and J. Cowan, “Faithful representation of separable distributions”, Neural Computation, Vol. 9, pp. 1305-1320,1997.
[12] F. Tordini and F. Piazza, “A semi-blind approach to the separation of real world speech mixtures” , in IJCNN'02, Vol. 2, 2002, pp. 1293–1298.
[13] Aapo Hyvärinen, “Independent Component Analysis,” John Wiley, 2001.
[14] Roger L.berger, George Casella ,“Statistical Inference "second edition , DUXBURY 2002.
[15] T.M. Cover and J.A. Thomas, “Elements of Information Theory,” Wiley, 1991
[16] Aapo Hyvärinen, “New approximations of differential entropy for independent component analysis and projection pursuit,” Advance Neural Inform. Processing Syst. 10. MIT Press, pp.273-279, 1998
[17]Ephraim, Y. and Van Trees, H. L.: A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing. vol. 3, no. 4, pp. 251–266, July 1995
[18] A. Rezayee and S. Gazor, “An adaptive KLT approach for speech enhancement,” IEEE Trans. Speech Audio Processing, vol. 9, pp. 87-95, Feb. 2001.
[19] K. Hermus, P. Wambacq, and H.V. Hamme, “A review of signal subspace speech enhancement and its application to noise robust speech recognition,” EURASIP Journal on Advances in Signal Processing, vol. 2007, pp. Article ID 45821, 15 pages, 2007.
[20] J. Ramírez, J. M. Gorriz, and J. C. Segura (2007) “Voice activity detection. Fundamentals and speech recognition system robustness” In M. Grimm and K. Kroschel, editors, Robust Speech Recognition and Understanding, I-Tech, 2007.
[21] Jaber Marvan, “Voice Activity detection Method and Apparatus for voiced/unvoiced decision and Pitch Estimation in a Noisy speech feature extraction”, 08/23/2007, United States Patent 20070198251.
[22] Rabiner, L. R., and Schafer, R. W., Digital Processing of Speech Signals, Englewood Cliffs, New Jersey, Prentice Hall, 512-ISBN-13:9780132136037, 1978.
[23] Young, S. et al. HTKbook (V3.4), Cambridge University Engineering Dept. (2006)
[24] Young, S., Everman, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland P. (2001) The HTK Book 3.1. Cambridge: Entropic.
[25] Taylor, P., King, S., Isard, S. and Wright, H. (1998) Intonation and Dialog Context as Constraints for Speech Recognition. In: Language and Speech, vol.41 (3-4), pp.493-512.
[26] A. Varga, H.J.M Steenneken, M. Tomlinson and D. Jones. The NOISEX-92 study on the effect of additive noise on automatic speech recognition, 1992. Documentation included in the NOISEX-92 CD-ROMs.
[27] R. Kuhn, F. Perronnin, P. Nguyen, J.-C. Junqua, and L. Rigazio, “Very fast adaptation with a compact context-dependent eigenvoice model,” in Proc. ICASSP, May 2001, vol. 1, pp. 373–376.
[28] C. Y. Tseng, ”A phonetically oriented speech database for Mandarin Chinese,” Proc. ICPhS95, Stockholm, pp.326-329, 1995
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top