跳到主要內容

臺灣博碩士論文加值系統

(216.73.217.137) 您好!臺灣時間:2026/05/06 15:09
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林季暉
研究生(外文):Chi-hui Lin
論文名稱:以類神經網路深度學習的語音增強方法
論文名稱(外文):A Modified Deep Neural Network Speech Enhancement Model
指導教授:王益文王益文引用關係
口試委員:王壘陳德生
口試日期:2016-07-18
學位類別:碩士
校院名稱:逢甲大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:中文
論文頁數:48
中文關鍵詞:深度類神經網路語音增強彈性傳播演算法
外文關鍵詞:Deep Neural NetworksSpeech EnhancementResilient Propagation
相關次數:
  • 被引用被引用:1
  • 點閱點閱:475
  • 評分評分:
  • 下載下載:95
  • 收藏至我的研究室書目清單書目收藏:1
由於現代的移動式裝置對於語音品質的要求日益提升,因此增強語音清晰度的演算法是眾人所急需的。本論文開發了基於深度學習多層結構之類神經網路的語音增強系統,以Y. Xu 論文為基礎,並且修改DNNs的架構和MLPs的學習演算法,使DNNs的學習更有效率。本論文的DNNs架構,由多個RBMs展開後所組成的一個深層的MLPs,並於其後再加上一層MLP;此最後一層MLP,neuron的activation function皆為linear,且其weights初始化為單位矩陣;訓練此DNNs的演算法,捨棄傳統的back-propagation,並以resilient propagation取代之。本論文已經以多個實驗學習NOIZEUS的語音資料集驗證此三個調整可以加速DNNs的學習。分析了語音資料的以下四種特性領域,噪音強度、噪音種類、語句、性別,並且找到了各領域的關鍵特性及其中的特性之間的相互關係,最終得以縮小訓練的集合,減輕學習上的負擔。最終為了可有效的控制DNN語音學習模型的參數,分析了DNNs的學習結果以及學習後增強的語音品質之間的關係。
Speech intelegibility is more essential than before due to more and more mobile devices need to improve speech quality. This paper develops a more efficient deep neural network (DNN) speech enhancement model based upon Y. Xu’s DNN speech enhancement model, but the structure of DNNs and the learning algorithm of MLPs are modified to achieve an efficient DNN learning. A deep MLP neural net is composed by unrolling a stack of RBMs and adding on a layer to the last stage of the MLP. In the last layer, each neuron has a linear activation function with initial identity weights. Instead of the back-propagation, the resilient propagation learning is employed to train our DNN. These three modifications can speed up DNNs’ learning, that is verified with several experiments using NOIZEUS speech dataset. The key characteristics and mutual relationships among noise intensities, noise types, sentences and human gender, are also identified to reduce the size of training dataset. In order to effectively control parameters of our DNN speech enhancement model, correlations between learning results of DNNs and qualities of enhanced speech are analyzed.
誌謝 i
摘要 ii
Abstract iii
目錄 iv
圖目錄 vi
表目錄 vii
第一章 緒論 1
1.1 研究動機 1
1.2 論文架構 2
第二章NEURAL NETWORKS的相關研究 3
2.1 MULTILAYER PERCEPTRONS AND GRADIENT DECENT ALGORITHM 4
2.1.1 Resilient Propagation Algorithm 7
2.2 RESTRICTED BOLTZMANN MACHINE AND CONTRASTIVE DIVERGENCE LEARNING 8
第三章 研究方法 12
3.1基於DNNS的語音增強學習系統 13
3.1.1 Feature Extraction 14
3.1.2 Waveform Reconstruction 15
3.2 DEEP NEURAL NETWORKS MODEL 16
3.2.1 Pretraining DNNs with Noisy Features 17
3.2.2 Training DNNs with Noisy Features and Clean Features 17
3.3 語音增強的系統 19
第四章 實驗 20
4.1 實作細節 20
4.2 修改的DNNS學習效率被改善的驗證 22
4.2.1 Gradient Decent Algorithm的改良 22
4.2.2 Initial Final Weights of MLPs的改良 23
4.2.3 DNNs架構的改善 23
4.3 不同的學習語音集對於學習效果的影響 25
4.3.1 噪音強度對於語音學習的影響 27
4.3.2 噪音種類對於語音學習的影響 29
4.3.3 字句對於語音學習的影響 31
4.3.4人對於語音學習的影響 34
4.4 DNNS的學習效果與語音增強的關係 35
4.4.1 DNNs學習Magnitude資訊後對於語音的增強效果 36
4.4.2 假設DNNs學習完整的頻域資訊後對於語音的增強效果 40
第五章 結論 44
參考文獻 46
[1]D. Burshtein and S. Gannot, “Speech enhancement using a mixture-maximum model,” IEEE Trans. on Speech and Audio Processing, Vol.10, No.6, pp.341-351, 2002.
[2]L. Bahl, P. Brown, P. de Souza, and R. Mercer, “Maximum mutual information estimation of hidden Markov model parameters for speech recognition,” Proceedings of the ICASSP, pp. 49–52, 1986.
[3]Y. Bengio, “Learning deep architectures for AI,” Found. Trends Mach. Learn., vol. 2, no. 1, pp. 1–127, 2009.
[4]I. Cohen and S. Gannot, “Spectral enhancement methods,” Springer Handbook of Speech Processing, J. Benesty, M. M. Sondhi, and Y. Huang, Eds. Berlin, Germany: Springer, pp. 873–901, 2008.
[5]L. Deng, “Computational models for speech production,” Computational Models of Speech Pattern Processing, pp. 199–213. Springer-Verlag, New York, 1999
[6]D. Griffin and J. S. Lim, “Signal estimation from modified shorttime Fourier transform,” IEEE Trans. on ASSP, Vol.32, No.2, pp.236-243, 1984.
[7]Simon Haykin, Feedforward Neural Networks: An Introduction, Wiley, 1998.
[8]G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural Computation, 14(8), 1711–1800, 2002.
[9]G. E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, pp. 1527–1554, 2006.
[10]G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.
[11]G. E. Hinton, L. Deng, D. Yu, and G. E. Dahl, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, 2012.
[12]H. Hirsch, and D. Pearce. “The Aurora Experimental Framework for the Performance Evaluation of Speech Recognition Systems under Noisy Conditions,” ISCA ITRW ASR2000, Paris, France, September 18-20, 2000.
[13]J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the National Academy of Sciences, 79:2554-2558, 1982.
[14]Bong-Ki Lee and Joon-Hyuk Chang, “Packet Loss Concealment Based on Deep Neural Networks for Digital Speech Transmission,” IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 2, FEBRUARY 2016.
[15]P.C. Loizou, “Speech Enhancement: Theory and Practice,” Taylor and Francis, 2007.
[16]Y. Hu, P. C. Loizou, "Subjective comparison and evaluation of speech enhancement algorithms," Speech communication 49 (7) (2007) 588-601.
[17]Martin Riedmiller and Heinrich Braun. “A direct adaptive method for faster backpropagation learning: The RPROP algorithm,” Proceedings of the IEEE International Conference on Neural Networks, San Francisco, CA, April 1993.
[18]IEEE Subcommittee, “IEEE Recommended Practice for Speech Quality Measurements,” IEEE Trans. Audio and Electroacoustics, AU-17(3), 225-246, 1969.
[19]J. Tchorz and B. Kollmeier, “SNR estimation based on amplitude modulation analysis with applications to noise suppression,” IEEE Trans. Speech Audio Process, vol. 11, no. 3, pp. 184–192, May 2003.
[20]Paul J. Werbos, “Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences,” PhD thesis, Harvard University, 1974.
[21]S. I. Tamura, “An analysis of a noise reduction neural network,” Proc. ICASSP, 1989, pp. 2001–2004.
[22]E. A. Wan and A. T. Nelson, “Networks for speech enhancement,” Handbook of Neural Networks for Speech Processing, S. Katagiri, Ed. Norwell, MA, USA: Artech House, 1998.
[23]Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, “An experimental study on speech enhancement based on deep neural networks,” IEEE Signal Process. Lett., vol. 21, no. 1, pp. 65–68, Jan. 2014.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top