跳到主要內容

臺灣博碩士論文加值系統

(44.201.97.0) 您好!臺灣時間:2024/04/14 04:33
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃士倫
研究生(外文):Huang, Shih-Lun
論文名稱:結合遞迴神經網路與自編碼器之時頻遮罩估計應用於語音強化
論文名稱(外文):Time-Frequency Mask estimation combining RNN and Autoencoder for Speech Enhancement
指導教授:胡竹生胡竹生引用關係
指導教授(外文):Hu, Jwu-Sheng
口試委員:成維華冀泰石
口試委員(外文):Chieng, Wei-HuaChi, Tai-Shih
口試日期:2020-02-19
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電控工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:49
中文關鍵詞:遞迴神經網路自編碼器最佳比例遮罩時頻譜
外文關鍵詞:RNNAutoencoderOptimal ratio maskSpectrogram
相關次數:
  • 被引用被引用:3
  • 點閱點閱:553
  • 評分評分:
  • 下載下載:137
  • 收藏至我的研究室書目清單書目收藏:0
本文提出一種結合深層神經網路與最小平方濾波器之單通道語音強化算法。利用遞迴神經網路提取時間資訊、深層前饋網路學習特徵重建,然後將兩者結合為降噪自編碼器,以估計語音比率和噪音比率。最後代入經過補償的 Wiener 濾波器,推算出最佳比例時頻譜遮罩。在訓練階段,採用語音、噪音,以及兩者的互相關等多個訓練目標,以提高網路的性能。此外,本文嘗試利用合成噪音代替真實噪音,做為神經網路的訓練集,並在真實環境噪音下測試其強化效果。實驗階段測試不同訊噪比下的平穩噪音與人群噪音,最後利用客觀評分系統和傳統算法做對比。
This thesis proposes a single-channel speech enhancement algorithm that combines deep neural network and least square filter. Recursive neural network is used to extract time information, deep feedforward network is used to learn feature reconstruction, and then the two are combined into a denoising autoencoder for estimating speech ratio and noise ratio. Finally, the compensated Wiener filter is used to calculate the optimal time-frequency mask. During the training phase, multiple training targets such as speech, noise, and cross-correlation are used to enhance the performance of the network. In addition, the proposed system attempts to use synthetic noise instead of real noise as a training set for neural networks, and tests its enhancement effect under real ambient noise. In the experimental stage, the stationary noise and crowd noise under different signal-to-noise ratios were tested. Finally, an objective scoring system was used to compare the proposed system with traditional algorithms.
摘要 i
ABSTRACT ii
致謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章 緒論 1
1.1研究動機 1
1.2研究目標 1
1.3文獻回顧 1
1.4論文架構 2
第二章 單通道語音強化 3
2.1時頻譜 3
2.1.1短時傅立葉變換 4
2.1.2離散時頻譜 4
2.2語音活動偵測 5
2.3訊噪比 5
2.4增益函數 6
2.4.1維納濾波器(Wiener filter) 6
2.4.2互相關補償 8
第三章 深度學習與神經網路 9
3.1神經元 9
3.2深層神經網路 10
3.3非線性層 11
3.3.1激發函數 11
3.3.2池化層 12
3.4參數最佳化 13
3.5遞迴網路 14
3.5.1遞迴神經網路 14
3.5.2雙向遞迴網路 15
3.5.3堆疊遞迴網路 16
3.5.4閘門遞迴單元 17
3.6降噪自編碼器 18
3.6.1自編碼器 18
3.6.2降噪自編碼器 19
第四章 系統架構 20
4.1.1網路架構 21
4.1.2網路設計理念 22
4.2訓練方法 24
4.2.1語音資料 24
4.2.2噪音產生器 24
4.2.3訓練目標與參數設定 27
第五章 實驗結果與分析 28
5.1客觀評分系統 28
5.2合成噪音測試 31
5.2.1白噪音 32
5.2.2粉紅噪音 36
5.2.3人群噪音 40
第六章 結論 47
6.1研究成果 47
6.2未來展望 47
參考文獻 48
[1] Y. LeCun, Y. Bengio and G. Hinton, “Deep learning,” nature, vol.521, p. 436, 2015.
[2] P. Scalart and others, “Speech enhancement based on a priori signal to noise estimation,” 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, 1996.
[3] Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator,” IEEE Transactions on acoustics, speech, and signal processing, vol.32, pp. 1109-1121, 1984.
[4] I. Cohen and B. Berdugo, “Noise estimation by minima controlled recursive averaging for robust speech enhancement,” IEEE signal processing letters, vol.9, pp. 12-15, 2002.
[5] S. Pascual, A. Bonafonte and J. Serra, “SEGAN: Speech enhancement generative adversarial network,” arXiv preprint arXiv:1703.09452, 2017.
[6] B. Xia and C. Bao, “Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification,” Speech Communication, vol.60, pp. 13-29, 2014.
[7] Y. Yang and C. Bao, “Dnn-Based Ar-Wiener Filtering for Speech Enhancement,”2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
[8] L. Sun, J. Du, L.-R. Dai and C.-H. Lee, “Multiple-target deep learning for LSTM-RNN based speech enhancement,”2017 Hands-free Speech Communications and Microphone Arrays (HSCMA), 2017.
[9] J.-M. Valin, “A hybrid DSP/deep learning approach to real-time full-band speech enhancement,”2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP), 2018.
[10] S. Liang, W. Liu, W. Jiang and W. Xue, “The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio,” The Journal of the Acoustical Society of America, vol.134, pp. EL452--EL458, 2013.
[11] G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of control, signals and systems, vol.2, pp. 303-314, 1989.
[12] D. E. Rumelhart, G. E. Hinton, R. J. Williams and others, “Learning representations by back-propagating errors,” Cognitive modeling, vol.5, p. 1, 1988.
[13] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
[14] P. Warden, “Speech commands: A dataset for limited-vocabulary speech recognition,” arXiv preprint arXiv:1804.03209, 2018.
[15] C. H. Taal, R. C. Hendriks, R. Heusdens and J. Jensen, “A short-time objective intelligibility measure for time-frequency weighted noisy speech,”2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010.
[16] A. W. Rix, J. G. Beerends, M. P. Hollier and A. P. Hekstra, “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs,”2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), 2001.
[17] I. Electrical and E. Engineers, “IEEE recommended practice for speech quality measurements,” IEEE transactions on audio and electroacoustics, vol.17, pp. 225-246, 1969.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top