跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.81) 您好!臺灣時間:2025/10/05 05:10
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:李世光
研究生(外文):LEE, SHIH-KUANG
論文名稱:整合降噪自編碼與離散小波轉換之語音強化
論文名稱(外文):Speech Enhancement Based on the Integration of Denoising Auto-Encoder and Discrete Wavelet Transform
指導教授:洪志偉洪志偉引用關係
指導教授(外文):HUNG, JEIH-WEIH
口試委員:林容杉洪志偉曹昱
口試委員(外文):LIN, JUNG-SHANHUNG, JEIH-WEIHYU, TSAO
口試日期:2018-07-30
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:107
語文別:英文
論文頁數:52
中文關鍵詞:語音強化時頻圖降噪自動編碼器小波轉換調變頻譜
外文關鍵詞:Speech EnhancementSpectrogramDenoising Auto-EncoderDiscrete Wavelet TransformModulation Spectrum
相關次數:
  • 被引用被引用:0
  • 點閱點閱:198
  • 評分評分:
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
在當今普遍的語音應用、諸如語音辨識、語音資訊檢索及聲控機器人等,用以消除雜訊干擾的語音強化技術扮演了相當重要的角色。在本論文中,我們提出了一個新穎的語音強化技術,其利用離散小波轉換對於語音之頻譜時序列加以分解,進而抑制其調變高頻的成分,而達到強化語音之效果,同時,我們將其與著名的降噪自動編碼器加以整合,提出了兩種不同的整合法。
根據我們初步的評估實驗,首先發現在於當使用高訊雜比的訓練語料時,所對應的 DAE 法在各種訊雜比的測試語音上,平均而言都能得到顯著的消噪效果,同時,我們驗證了新提出之小波轉換語音強化法提升語音品質之效能,同時,在高音框取樣率的前提下,小波轉換語音強化法與 DAE 法有良好的加成效果。
Speech enhancement (SE) that reduces the noise effect plays an important role in the current widespread audio applications such as speech recognition, speech-based information retrieval and voice control. In this study, we present a novel speech enhancement method that applies discrete wavelet transform (DWT) to the spectrogram of a noisy speech, and then diminishes the high-pass portion in order to reduce the noise effect. In addition, we investigate the influence of the training data with different signal-to-noise ratios (SNRs) for a well-known speech enhancement method, denoising auto-encoder, DAE, in the corresponding SE capability. Finally, we propose two forms of integration for the newly proposed DWT-wise SE method and DAE.
The preliminary experimental results have shown that high-SNR training data give rise to better performance of DAE, and the newly proposed DWT-wise SE method can provide a noisy speech signal with improved quality. Furthermore, the integration of the aforementioned two methods can outperform the individual component method at the condition of a high frame rate.
摘要.................................i
Abstract................................ii
Table of Contents............................iii
List of Tables..............................v
List of Figures.............................vi
Chapter 1 Introduction........................1
1.1 Motivation and Background...................1
1.2 Thesis Outline.........................5
Chapter 2 Introduction to Speech Enhancement Techniques.........6
2.1 Spectral Subtractive Algorithms..................6
2.2 Wiener Filtering........................9
Chapter 3 Autoencoder and Denoising Autoencoder............12
3.1 Autoencoder..........................12
3.2 Denoising Autoencoder.....................16
3.3 Speech Enhancement Based on a Denoising Autoencoder........19
Chapter 4 A Novel Modulation-based Speech Enhancement Method Based on
Discrete Wavelet Transform......................21
4.1 Discrete Wavelet Transform...................21
4.2 Modulation-domain Wavelet Denoising (ModWD)..........25
4.3 Integrating ModWD with Denoising Auto-Encoder..........29
Chapter 5 Experimental Results and Discussions.............32
5.1 Experimental Setup.......................32
5.2 Preliminary Evaluation Results of DAE with the Training Data with Different
SNR Levels..........................35
5.3 Experimental Results for the Two Forms of Integration of ModWD and
DAE.............................38
5.4 The Effect of Varying Frame Rates in Various Speech Enhancement
Methods...........................40
5.5 Experimental Results for an Environment with Noise and Reverberance..44
5.6 Summary...........................46
Chapter 6 Conclusion and Future Work.................47
References..............................49
[1] J. Benesty, M. M. Sondhi, and Y. Huang, "Springer handbook of speech processing," Springer Science & Business Media, 2007.
[2] P. C. Loizou, "Speech Enhancement: Theory and Practice," Taylor and F. Group, Eds. Boca Raton, FL, USA: CRC Press, 2013.
[3] M. Weiss, E. Aschkenasy and T. "Parsons, Study and development of the INTEL technique for improving speech intelligibility," Technical Report NSC-FR/4023, Nicolet Scientific Corporation, Northvale, NJ, 1974
[4] S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. on Acoustics, Speech and Signal Processing, 113–120. 1979
[5] N. Wiener, "Extrapolation, Interpolation and Smoothing of Stationary Time Series with Engineering Applications," Cambridge, MA: MIT Press, 1949
[6] Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Trans. on Acoustics, Speech and Signal Processing, 32(6), pp. 1109-1121, 1984.
[7] Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator," IEEE Trans. on Acoustics, Speech and Signal Processing, 1985
[8] R. J. McAulay and M. L. Malpass, "Speech enhancement using a soft-decision noise suppression filter," IEEE Trans. on Acoustics, Speech and Signal Processing, 1980
[9] J. Lim and A.V. Oppenheim, "Enhancement and bandwidth compression of noisy speech," in Proc. IEEE, 1979
[10] J. Lim and A.V. Oppenheim, "All-pole modeling of degraded speech," IEEE Trans. on Acoustics, Speech and Signal Processing, 1978
[11] M. Dendrinos, S. Bakamides, and G. Carayannis, "Speech enhancement from noise: A regenerative approach," Speech Communication, 10, 45–57, 1991
[12] Y. Ephraim and H. L. Van Trees, "A signal subspace approach for speech enhancement," in Proc. ICASSP, 1993
[13] G. Brown and M. Cooke, "Computational auditory scene analysis," Computer Speech and Language, 8, 297–336, 1994
[14] M. Weintraub, "A theory and computational model of auditory monaural sound separation," PhD thesis, Stanford University, Stanford, CA, 1985
[15] M. Cooke, P. Green, L. Josifovski and A. Vizinho, "Robust automatic speech recognition with missing and uncertain acoustic data," Speech Communication, 34, 267–285, 2001
[16] N. Roman, D. Wang and G. Brown, "Speech segregation based on sound localization," J. Acoust. Soc. Am., 114, 2236–2252, 2003
[17] G. Kim, Y. Lu, Y. Hu and P. Loizou, "An algorithm that improves speech intelligibility in noise for normal-hearing listeners," J. Acoust. Soc. Am., 126(3), 1486–1494, 2009
[18] S. K. Mitra, "Digital Signal Processing: A Computer Based Approach, "4th, McGraw-Hill, 2011.
[19] I. Goodfellow, Y. Bengio and A. Courville, "Deep learning," MIT Press, 2016
[20] P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, "Extracting and composing robust features with denoising autoencoders," in Proc. ICML, 2008.
[21] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, "Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion," J. Machine Learning Res., 11, 2010
[22] L. Xugang, Y. Tsao, S. Matsuda and C. Hori, "Speech enhancement based on deep denoising autoencoder." in Proc. Interspeech, 2013
[23] X. Yong, J. Du, L-R. Dai and C-H. Lee, "An experimental study on speech enhancement based on deep neural networks." IEEE Signal Processing Letters, pp. 65-68, 2014
[24] X. Yong, J. Du, L-R. Dai and C-H. Lee, "A Regression Approach to Speech Enhancement Based on Deep Neural Networks." IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 (2015): 7-19.
[25] C. K. Chui, "Biorthogonal wavelets, "Wavelets: A Tutorial in Theory and Applications, pp. 123-152, 1992.
[26] C. Vonesch, T. Blu, and M. Unser, "Generalized Daubechies wavelet families, "IEEE Transactions on Signal Processing, vol. 55, no. 9, pp. 4415–4429, 2007.
[27] N. Kanedera, T. Arai, H. Hermansky, and M. Pavel, "On the importance of various modulation frequencies for speech recognition," in Proc. Eurospeech, 1997.
[28] C. Chen and J. Bilmes, "MVA processing of speech features," IEEE Trans. on Audio, Speech, and Language Processing, pp. 257-270, 2006.
[29] X. Xiao, E. S. Chng and H. Li, "Normalization of the speech modulation spectra for robust speech recognition," IEEE Trans. on Audio, Speech, and Language Processing, vol. 16, no. 8, pp. 1662-1674, 2008.
[30] S-K. Lee and J-W. Hung, "An evaluation study of using various SNR-level Training data in the denoising auto encoder (DAE) technique for speech enhancement," International Journal of Electrical, Electronics and Data Communication (IJEEDC), 2017.
[31] L. L. Wong, S. D. Soli, S. Liu, N. Han, and M-W. Huang, "Development of the Mandarin hearing in noise test (MHINT)," Ear and Hearing, 28 (2), pp. 70-74, 2007
[32] A. W. Rix, J. G. Beerends, M. P. Hollier and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ) – a new method for speech quality assessment of telephone networks and codecs," in Proc. ICASSP, pp. 749-752, 2001
[33] E. Habets, "Room impulse response generator, "https://github.com/ehabets/RIR-Generator.
[34] J. B. Allen and D. A. Berkley, "Image method for efficiently simulating small-room acoustics," Journal Acoustic Society of America, 65(4), pp. 943, 1979.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊