跳到主要內容

臺灣博碩士論文加值系統

(44.201.72.250) 您好!臺灣時間:2023/09/24 05:43
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陸清達
研究生(外文):Ching-Ta Lu
論文名稱:使用臨界頻帶之封裝式小波轉換於單通道語音增強之研究
論文名稱(外文):A Study on Single Channel Speech Enhancement Using Critical-Band-Wavelet-Packet Transform
指導教授:王小川王小川引用關係
指導教授(外文):Hsiao-Chuan Wang
學位類別:博士
校院名稱:國立清華大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:中文
論文頁數:102
中文關鍵詞:語音增強封裝式小波轉換臨界頻帶遮蔽特性
外文關鍵詞:Speech enhancementwavelet-packet transformcritical bandmasking property
相關次數:
  • 被引用被引用:0
  • 點閱點閱:227
  • 評分評分:
  • 下載下載:30
  • 收藏至我的研究室書目清單書目收藏:2
本論文提出兩種單通道語音增強技術,這些技術使用臨界頻帶之封裝式小波轉換,將語音信號轉換成小波係數,並且在每個次頻帶上進行雜訊消除的工作,其中每個次頻帶上的小波係數閾值及增益函數均使用雜訊遮蔽閾值作調整。
在第一種方法中,先將受干擾的語音信號轉換成小波係數,然後將每個次頻帶的小波係數減掉一個閾值,而這個閾值是由片段訊雜比值及雜訊遮蔽閾值所調適;因此殘留雜訊可以有效地被壓抑。在雜訊為主的次頻帶中,背景雜訊幾乎可以被移除;而在語音為主的次頻帶中,透過降低小波係數閾值,語音失真可以降低,實驗結果證明:使用聽覺遮蔽效應來調適語音增強系統,可以有效的壓抑背景雜訊,而且語音失真也可以維持在可接受的範圍內。
第二種方法提出使用知覺限制來導出一個增益因子做語音增強,使得該語音增強系統可以對付各種被彩色雜訊干擾的語音信號。增強後的語音信號的品質是由語音失真及殘留雜訊來決定,如果殘留雜訊比聽覺遮蔽閾值小,人耳無法感受到該雜訊的存在,因此,我們將增益值設為一,使得語音信號得以保留下來;相反的,如果殘留雜訊比聽覺遮蔽閾值大,則人耳聽得到該雜訊,因此將增益因子調整小一點,使得干擾雜訊得以被壓抑。
在嚴重雜訊干擾的環境中,為了有效的壓抑雜訊,雜訊強度通常會高估,使得增益因子被低估,應用該增益因子作語音增強,會造成語音失真,並且產生低沉的聲音。因此,我們提出一個觀念,即推導出增益因子的下限值,防止受干擾語音被過度衰減。我們限制語音失真必須小於殘留雜訊為準則,便可以得到增益因子的下限值,因此其對應的雜訊遮蔽閾值也會得到一個下限值;應用該下限值來調適增益因子,可以降低語音失真及音樂型殘留雜訊的的影響;實驗結果證明:使用本方法所處理後的語音信號聽起來比較自然,而且音樂型殘留雜訊幾乎都聽不到。
In this dissertation, we present two new techniques for single channel speech enhancement. These techniques reduce the noise in each subband based on the critical-band-wavelet-packet decomposition. A noise masking threshold (NMT) is employed to adjust wavelet-coefficient threshold or gain function for a subband.
The first approach is to convert a noisy signal into wavelet coefficients (WCs), and subtract a threshold from noisy WCs in each subband. The threshold of each subband is adapted according to the segmental SNR and the NMT. Thus the residual noise can be efficiently suppressed. In the noise-dominated frame, the background noise can be almost removed. In the speech-dominated subbands, the speech distortion can be reduced by decreasing the wavelet-coefficient threshold. Experimental results show that the background noise is reduced and that the residual noise is less structured than a system without using masking properties, while the level of speech distortion remains acceptable.
The second approach proposes a gain factor in each wavelet subband subject to a perceptual constraint. This perceptual constraint preserves the WCs of noisy speech when the level of residual noise is smaller than the NMT. A speech enhancement algorithm adapted with the NMT can cope with noisy speech corrupted by various types of colored noise. The performance of enhanced speech is characterized by a tradeoff between the amount of speech distortion and the level of musical residual noise. If the level of residual noise is smaller than the NMT, the human ear cannot perceive the corrupting noise. In this situation the gain factor is set to unity. Conversely, if the level of residual noise exceeds the NMT, then the gain factor tends to be smaller, and the corrupting noise is suppressed. Since the noise level is usually overestimated at low SNR, it leads to an underestimate of the gain factor. This results in more speech distortion and a muffled sound. Therefore, we propose a lower bound on the gain factor to prevent the noise level from being over-attenuated. The lower bound on gain factor is obtained by keeping the speech distortion smaller than the residual noise. Accordingly, the corresponding lower bound on the NMT is also obtained. This lower bound on the gain factor must be adapted to the noise level to reduce the speech distortion and minimize the musical residual noise. Experimental results show that the enhanced speech sounds more natural, and the musical residual noise is almost inaudible.
目 錄
摘 要 i
誌 謝 ii
目 錄 iii
第一章 導論 iv
第二章 臨界頻帶之封裝式小波轉換 vii
第三章 使用聽覺遮蔽效應壓抑小波係數 ix
第四章 使用混合增益因子的語音增強系統 xii
第五章 結 論 xvi
英文版博士論文 xvii
Contents
Abstract i
Acknowledgements iii
Contents iv
List of Figures vi
List of Tables viii
Chapter 1 Introduction 1
1.1 Single Channel Speech Enhancement 1
1.2 Wavelet Representation of a Signal 3
1.3 Problems and Approaches 4
1.4 Organizations of the Dissertation 7
Chapter 2 Critical-Band-Wavelet-Packet Transform for Speech Enhancement 8
2.1 A Brief Review of Wavelet Transform 8
2.2 Computation of Wavelet Coefficient 10
2.3 Structure of Critical-Band-Wavelet-Packet Decomposition 11
2.4 Estimation of Noise Masking Threshold 15
2.5 Experimental Environments 19
2.6 Performance Evaluation 20
Chapter 3 Thresholding the Wavelet Coefficients Based on Masking Property 24
3.1 Survey of Previous Works 25
3.2 Proposed Speech Enhancement System 28
3.3 Thresholding Wavelet Coefficients 29
3.4 Mean Square Error of the Enhanced Speech 31
3.5 Proposed Wavelet Coefficients Threshold 33
3.6 Procedure of the Proposed Algorithm 38
3.7 Experiments and Performance Evaluation 39
3.8 Discussions 50
Chapter 4 Hybrid Gain Factor in Critical-Band-Wavelet-Packet Transform 52
4.1 Survey of Previous Works 52
4.2 Derivation of Gain Factors 56
4.3 Noise Estimation 63
4.4 Experiments and Performance Evaluation 66
4.5 Discussions 78
Chapter 5 Conclusions 81
5.1 Summary of Results 81
5.2 Contributions 82
Appendix 84
A. List of Abbreviations and Symbols 84
B. Mean Square Error for the WCs below the WCT 87
C. Detailed Procedure of Modified Minimum Statistics Noise Estimation Algorithm 89
References 93
Publication List 101
References
Boll, S. F., 1979. “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust. Speech, Signal Processing, vol. ASSP-27, pp. 113-120.
Bahoura, M., Rouat, J., 2001a. “Wavelet speech enhancement based on the Teager energy operator,” IEEE Signal Processing Lett., vol. 8, no. 1, pp. 10-12.
Bahoura, M., Rouat, J., 2001b. “A new approach for wavelet speech enhancement,” in Proc. European. Conf. on Speech Comm. and Technology ( EuroSpeech), pp. 1937-1940.
Cappé, O., 1994. “Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 345-349.
Carnero, B., Drygajlo, A., 1999. “Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithm,” IEEE Trans. Signal Processing, vol. 47, pp. 1622-1635.
Chong, N. R., Burnett, I. S., Chicharo, J. F., 2000. “A new waveform interpolation coding scheme based on pitch synchronous wavelet transform decomposition,” IEEE Trans. Speech Audio Processing, vol. 8, no. 3, pp. 345-348.
Chen, S. H., Wang, J. F., 2004. “Speech enhancement using perceptual wavelet packet decomposition and Teager energey operator,” J. VLSI Signal Processing, vol. 36, pp. 125-139.
Deller, J., Proakis, J., Hansen, J., 1993. Discrete-Time Processing of Speech Signals. Macmillan, New York.
Donoho, D. L., Johnston, I. M., 1994. “Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, no. 3, pp. 425-455.
Donoho, D. L., 1995. “De-noising by soft-thresholding,” IEEE Trans. Inform. Theory, vol. 41, no. 3, pp. 613-627.
Doblinger, G., 1995. “Computationally efficient speech enhancement by spectral minima tracking in subbands,” in Proc. European Conf. Speech Commun. Technology (EuroSpeech), pp. 1513-1516.
Durand, S., Froment, S., J., 2001. “Artifact free signal denoising with wavelets,” in Proc. Int. Conf. Acoust. Speech Signal Processing (ICASSP).
Ephraim, Y., Malah, D., 1984. “Speech enhancemebt using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-32, pp. 1109-1121.
Fu, Q., Wan, E. A., 2003. “Perceptual wavelet adaptive denoising of speech,” in Proc. European Conf. Speech Commun. Technology (EuroSpeech), pp. 577-580.
Gulzow, T., Engelsberg, A., Heute, U., 1998. “Comparison of a discrete wavelet transformation and nonuniform polyphase filterbank applied to spectral-subtraction speech enhancement,” Signal Processing, vol. 64, pp. 5-19.
Hansen, J. H. L., 1994. “Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 598-614.
Hu, Y., Loizou, P. C., 2003. “A perceptually motivated approach for speech enhancement,” IEEE Trans. Speech Audio Processing, vol. 11, pp. 457-465.
Hasan, M. K., Salahuddin, S., Kakhan, 2004. “A modified a priori SNR for speech enhancement using spectral subtraction rules,” IEEE Signal Processing Lett., vol. 11, pp. 450-453.
Hu, Y., Loizou, P.C.,2004a. “Speech enhancement based on wavelet thresholding the multitaper spectrum,” IEEE Trans. Speech Audio Processing, vol. 12, pp. 59-67.
Hu, Y., Loizou, P. C., 2004b. “Incorporating a psychoacoustical model in frequency domain speech enhancement,” IEEE Signal Processing Lett., vol. 11, pp. 270-273.
Jabloun, F., Champagne, B., 2001. “A multi-microphone signal subspace approach for speech enhancement,” in Proc. Int. Conf. on Acoust. Speech Signal Processing (ICASSP).
Johnston, J. D., 1998. “Transform coding of audio signal using perceptual noise criteria,” IEEE J. Select. Area Commun., vol. 6, pp. 314-323.
Jabloun, F., Cetin, A. E., Erzin, E., 1999. “Teager energy based feature parameters for speech recognition in car noise,” IEEE Signal Processing Lett., vol. 6, pp. 259-261.
Jansen, M., Bultheel, A., 2001. “Asymptotic behavior of the minimum mean squared error threshold for noisy wavelet coefficients of piecewise smooth signals,” IEEE Trans. Signal Processing, vol. 49, pp. 1113-1118.
Jaffer, E., Mahdi, A. E., 2003. “Wavelet-based perceptual speech enhancement using adaptive threshold esitmation,” in Proc. European Conf. Speech Commun. Technology (EuroSpeech), pp. 569-572.
Kaiser, J. F., 1990. “On a simplealgorithm to calculate the ‘energy’ of a signal,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (ICASSP), pp. 381-384.
Kaiser, J. F., 1993. “Some useful properties of Teager’s energy operators,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (ICASSP), vol. 3, pp. 149-152.
Lim, J. S., Oppenheim, A. V., 1979. “Enhancement and bandwidth compression of noisy speech,” Proc. IEEE, vol. 67, pp. 1586-1604.
Lockwood, P., Boudy, J., 1992. “Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and projection for robust recognitionin cars,” Speech Commun., vol. 11, pp. 215-228.
Lukasiak, J., Burnett, I. S., 2000. “Exploiting simultaneously masked linear prediction in a WI speech coder,” in Proc. Int. Conf. on Acoust. Speech Signal Processing (ICASSP), pp. 11-13.
Lu, C. –T., Wang, H. –C., 2002a. “Speech enhancement using wavelet transform with constrained thresholds,” in Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 181-184.
Lu, C. –T., Wang, H. –C., 2002b. “Enhancement of single channel speech using perception-based wavelet transform,” in Proc. International Conference on Spoken Language Processing (ICSLP), pp. 777-780.
Lin, C. T., 2003. “Single-channel speech enhancement in variable noise-level environment,” IEEE Trans. Syst., Man, Cybern. A, vol.33, no.1, pp. 137-144.
Lu, C. –T., Wang, H. –C., 2003a. “Speech enhancement using weighting function based on the variance of wavelet coefficients,” in Proc. European Conf. Speech Commun. Technology (EuroSpeech), pp. 521-524.
Lu, C. -T., Wang, H. -C., 2003b. “Enhancement of single channel speech based on masking property and wavelet transform,” Speech Commun., vol. 41/2-3, pp. 409-427.
Lev-Ari, H., Ephraim, Y., 2003. “Extension of the signal subspace speech enhancement approach to colored noise”, IEEE Signal Processing Lett., vol.10, no.4, pp. 104-106.
Lu, C. –T., Wang, H. –C., 2004a. “Speech enhancement using perceptually-constrained gain factors in critical-band-wavelet packet transform,” IEE Electronics Lett., vol. 40, no. 6, pp. 394-396.
Lu, C. -T., Wang, H. -C., 2004b. “Speech enhancement using robust weighting factors for critical-band-wavelet-packet transform,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (ICASSP), vol. 1, pp. 721-724.
Lu, C. –T., Wang, H. –C., 2005a. “Speech enhancement using hybrid gain factor in critical-band-wavelet packet transform,” to appear in Digital Signal Processing.
Lu, C. –T., Wang, H. –C., 2005b. “An explicit-form gain factor for speech enhancement using spectral-domain-constrained approach,” to appear in the IEICE Trans. Information and Systems.
Lu, C. –T., Wang, H. –C., 2005c. “An optimal smoothing factor for reducing musical residual noise in speech enhancement,” in Proc. Int. Sympo. Communications (ISCOM).
McAulay, R. J., Malpass, M. L., 1980. “Speech enhancement using a soft decision noise supptression filter,” IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-28, pp. 137-145.
Maragos, P., Quatieri, T., Kaiser, J. F., 1991. “Speech nonlinearities, modulation and energy operators,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (ICASSP), vol. 1, pp. 421-424.
Martin, R., 1994. “Spectral subtraction based on minimum statistics,” in Proc. European Signal Processing Conf. (ESPC), pp. 1182-1185.
Mahmoudi, D., 1997. “A microphone array for speech enhancement using multiresolution wavelet transform,” in Proc. European Conf. Speech Commun. Technology (EuroSpeech), pp. 339-342.
Meyer, J., Simmer, K. U., Kammeyer, K. D., 1997. “ Comparison of one and two-channel noise-estimation techniques,” in Proc. International Workshop on Acoustic Echo Control Noise Reduction, pp. 17-20.
Mahmoudi, D. Drygajlo, A., 1998. “Combined wiener and coherence filtering in wavelet domain for microphone array speech enhancement,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (ICASSP), vol. 1, pp. 358-388.
Mallat, S., 1999. A Wavelet Tour of Signal Processing, Academic-Press, San Diego: A Harcourt Science and Technology.
Martin, R., 2001. “Noise power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Trans. Speech Audio Processing, vol. 9, pp. 504-512.
Nandkumar, S., Hansen, J. H. L., 1995. “Dual-channel iterative speech enhancement with constraints on an auditory-based spectrum,” IEEE Trans. Speech Audio Processing, vol. 3, pp. 22-34.
Potamitis, I., Fakotakis, N., Kokkinakis, G., 2002. “Speech enhancement based on combining perceptual enhancement and short-time spectral attenuation,” in Proc. Int. Conf. Spoken Language Processing (ICSLP), pp. 1785-1788.
Quakenbush, S., Barnwell, T., Clements, M., 1993. Objective Measures of Speech Quality. Englewood Cliffs, NJ: Prentice-Hall.
Rouat, J., Liu, Y. C., Morissette, D., 1997. “A pitch determination and voiced/unvoiced decision algorithm for noisy speech,” Speech Commun., vol. 21, pp. 191-207.
Rezayee, A., Gazor, S., 2001. “An adaptive KLT approach for speech enhancement,” IEEE Trans. Speech Audio Processing, vol. 9, pp. 87-95.
Schroeder, M. R., Atal, B, S., Hall, J. L., 1979. “Optimizing digital speech coders by exploiting masking properties of the human ear,” J. Acoust. Soc. Amer., vol. 66, pp. 1647-1652.
Sinha, D., Tewfik, A. H., 1993. “Low bit rate transparent audio compression using adapted wavelets,” IEEE Trans. Signal Processing, vol. 41, pp. 3463-3479.
Strang, G., Nguyen, T., 1996. Wavelets and Filter Banks, Wellesley-Cambridge Press, San Diego.
Singh, L., Sridharan, S., 1997. “Speech enhancement for forensic application using dynamic time warping and wavelet packet analysis,” in Proc. IEEE TENCON-SITCT, pp. 475-478.
Sika, J., Davidek, V., 1997. “Multi-channel noise reduction using wavelet filter bank,” in Proc. European Conf. Speech Commun. Technology (EuroSpeech).
Srinivasan, P., Jamieson, L., 1998. “High-quality audio compression using an adaptive wavelet packet decomposition and psychoacoustic modeling,” IEEE Trans. Signal Processing, vol. 46, pp. 1085-1093.
Singh, L., Sidharan, S., 1998. “Speech enhancement using critical band subtraction,” in Proc. Int. Conf. Spoken Language Processing (ICSLP), pp. 2827-2830.
Selesnick, I. W., Sendur, L., 2000. “Smooth wavelet frames with application to denoising,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (ICASSP), pp. 129-132.
Sheikhzadeh, H., Abutalebi, H. R., 2001. “An improved wavelet-based speech enhancement system,” in Proc. European. Conf. Speech Commun. Technology (EuroSpeech), pp. 1855-1858.
Tsoukalas, D., Paraskevas, M., Mourjopoulos, J., 1993. “Speech enhancement using psycho-acoustic criteria,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (ICASSP), pp. 359-361.
Usagawa, T., Iwata, M., Ebata, M., 1994. “Speech parameter extraction in noisy environment using a masking model,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (ICASSP), vol. 2, pp. 81-84.
Virag, N., 1995. “Speech enhancement based on masking properties of the auditory system,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (ICASSP), pp. 796-799.
Virag, N., 1999. “Single channel speech enhancement based on masking properties of the human auditory system,” IEEE Trans. Speech Audio Processing, vol. 7, no. 2. pp. 126-137.
Veselinović, D., Graupe, D., 2003. “A wavelet transform approach to blind adaptive filtering of speech from unknown noises,” IEEE Trans. Circuits Syst. II, vol. 50, no. 3, pp. 150-154.
Wolfe, P. J., Godsill, S. J., 2001. “Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement,” in Proc. IEEE Workshop Statistical Signal Processing, pp. 496-499.
Yoon, S., Yoo, C. D., 2001. “Speech/Noise-dominant detection for speech enhancement,” in Proc. European Conf. Speech Comm. and Technology, pp. 1941-1944.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文