跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.87) 您好!臺灣時間:2025/03/17 13:58
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:彭江任
研究生(外文):Peng, Chiang-Jen
論文名稱:聲音三要素之多目標函數網路應用於語音增強演算法之研究
論文名稱(外文):Sound characteristic based multi-objective network for speech enhancement
指導教授:冀泰石
指導教授(外文):Chi, Tai-Shih
口試委員:曹昱王新民
口試委員(外文):Tsao, YuWang, Hsin-Min
口試日期:2021-10-08
學位類別:碩士
校院名稱:國立陽明交通大學
系所名稱:電信工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2021
畢業學年度:110
語文別:中文
論文頁數:54
中文關鍵詞:聲音三要素語音增強神經網路深度學習多目標函數網路
外文關鍵詞:sound characteristicspeech enhancementneural networkdeep learningmulti-objective network
相關次數:
  • 被引用被引用:0
  • 點閱點閱:291
  • 評分評分:
  • 下載下載:20
  • 收藏至我的研究室書目清單書目收藏:0
隨著神經網路的蓬勃發展,深度學習被廣泛研究,並且應用於各項領域中皆取得優秀的成果,除此之外,多項研究也顯示有效運用透過深度學習所學習出來之特徵,皆可提升相關研究之表現。因此,在本論文中我們利用深度學習網路得到組成聲音的三要素作為我們的多目標函數網路的語音特徵,並設計出能有效利用這些語音特徵之演算法,期望透過這些和聲音有極大關係的語音特徵能有效地提升我們語音增強模型的表現。

我們將提出之演算法 模型測試在 TMHINT (Taiwan Mandarin hearing in noise test)語料上,實驗結果顯示,我們利用 這些語音 特徵所設計之演算法和原語音增強模型相比,無論在語音的品質、理解度和其他量化評比皆有不錯的成長。
With the vigorous development of neural networks, deep
learning has been widely studied, and excellent results have been achieved in various fields. In addition, many studies have also shown that the features learned through deep learning can be effectively improve the performance of related research. Therefore, in this paper, we use the deep learning net-work to obtain the three factors as our features, and design algorithms that can effectively use these features. We hope that these features that have a great relationship with sound can be effectively improve the performance of our speech enhancement model.

We will test the proposed model on the Taiwan Mandarin hearing in noise test corpus. The experimental results show that the algorithm designed by using these features is compared with the original speech enhancement model, regardless of the speech quality, intelligibility and other evaluations all have good improvement.
摘要 i
Abstract ii
目錄 iii
表目錄 v
圖目錄 vii
第1章 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 1
1.3 論文架構 2
第2章 語音增強系統 3
2.1 語音增強系統介紹 3
2.1.1預測語音之語音增強模型系統 3
2.1.2預測噪音之語音增強模型系統 4
2.2 傳統語音增強模型系統 4
2.3 神經網路語音增強模型系統 5
2.3.1 神經網路簡介 5
2.3.2 以神經網路模型為架構的語音增強模型系統 6
第3章 聲音三要素之多目標函數語音增強模型 8
3.1 多目標學習 8
3.1.1 多目標學習介紹 8
3.2 聲音三要素介紹 8
3.2.1 頻率 9
3.2.2 響度 9
3.2.3 音色 10
3.3聲音三要素之多目標函數語音增強模型系統之設計 11
3.3.1 語音增強模型系統 11
3.3.2 以聲音三要素為發想之語音特徵 12
3.3.3 預測語音特徵之深度網路學習模型和結果 13
3.3.4 設計多目標函數損失函數 19
第四章 實驗結果與討論 21
4.1 實驗和參數設定 21
4.1.1 實驗流程 21
4.1.2 固定語音特徵網路模型和微調語音特徵網路模型 22
4.1.3 資料集和噪音來源 23
4.1.4 其他參數 24
4.1.5 量化評比 26
4.2 實驗第一部分 29
4.3 實驗第二部分 34
4.4 實驗結果比較和問題與討論 43
4.4.1實驗結果比較 44
4.4.2 問題與討論 47
第五章 結論與未來展望 51
參考文獻 52
[1] Wang, DeLiang. "Deep learning reinvents the hearing aid." IEEE spectrum 54.3 (2017): 32-37.

[2] Weninger, Felix, et al. "Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR." International conference on latent variable analysis and signal separation. Springer, Cham, 2015.

[3] Shon, Suwon, Hao Tang, and James Glass. "Voiceid loss: Speech enhancement for speaker verification." arXiv preprint arXiv:1904.03601 (2019).

[4] Loizou, Philipos C. Speech enhancement: theory and practice. CRC press, 2007.

[5] Boll, Steven. "Suppression of acoustic noise in speech using spectral subtraction." IEEE Transactions on acoustics, speech, and signal processing 27.2 (1979): 113-120.

[6] Lim, Jae Soo, and Alan V. Oppenheim. "Enhancement and bandwidth compression of noisy speech." Proceedings of the IEEE 67.12 (1979): 1586-1604.

[7] Ephraim, Yariv, and David Malah. "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator." IEEE Transactions on acoustics, speech, and signal processing 32.6 (1984): 1109-1121.

[8] Ephraim, Yariv, and David Malah. "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator." IEEE transactions on acoustics, speech, and signal processing 33.2 (1985): 443-445.

[9] Xu, Yong, et al. "A regression approach to speech enhancement based on deep neural networks." IEEE/ACM Transactions on Audio, Speech, and Language Processing 23.1 (2014): 7-19.

[10] Zhao, Yan, Zhong-Qiu Wang, and DeLiang Wang. "Two-stage deep learning for noisy-reverberant speech enhancement." IEEE/ACM transactions on audio, speech, and lan-guage processing 27.1 (2018): 53-62.

[11] Peng, Chiang-Jen, et al. "Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario." 2021 IEEE International Symposi-um on Circuits and Systems (ISCAS). IEEE, 2021.

[12] Chuang, Fu-Kai, et al. "Speaker-Aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement." Interspeech. 2019.

[13] Upadhyay, Navneet, and Rahul Kumar Jaiswal. "Single channel speech enhancement: us-ing Wiener filtering with recursive noise estimation." Procedia Computer Science 84 (2016): 22-30.

[14] Bees, Duncan, Maier Blostein, and Peter Kabal. "Reverberant speech enhancement using cepstral processing." Acoustics, Speech, and Signal Processing, IEEE International Conference on. IEEE Computer Society, 1991.

[15] Veisi, Hadi, and Hossein Sameti. "Speech enhancement using hidden Markov models in Mel-frequency domain." Speech Communication 55.2 (2013): 205-220.

[16] Boucheron, Laura E., and Phillip L. De Leon. "On the inversion of mel-frequency cepstral coefficients for speech enhancement applications." 2008 International Conference on Signals and Electronic Systems. IEEE, 2008.

[17] Fukushima, Kunihiko, Sei Miyake, and Takayuki Ito. "Neocognitron: A neural network model for a mechanism of visual pattern recognition." IEEE transactions on systems, man, and cybernetics 5 (1983): 826-834.

[18] LeCun, Yann, et al. "Backpropagation applied to handwritten zip code recogni-tion." Neural computation 1.4 (1989): 541-551.

[19] Hochreiter, Sepp, et al. "Gradient flow in recurrent nets: the difficulty of learning long-term dependencies." (2001).

[20] Lu, Xugang, et al. "Speech enhancement based on deep denoising autoencod-er." Interspeech. Vol. 2013. 2013.

[21] Caruana, Rich. "Multitask learning." Machine learning 28.1 (1997): 41-75.

[22] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computa-tion 9.8 (1997): 1735-1780.

[23] https://colah.github.io/posts/2015-08-Understanding-LSTMs/

[24] Huang, M. "Development of taiwan mandarin hearing in noise test." Department of speech language pathology and audiology, National Taipei University of Nursing and Health science (2005).

[25] Hu, Guoning, and DeLiang Wang. "A tandem algorithm for pitch estimation and voiced speech segregation." IEEE Transactions on Audio, Speech, and Language Processing 18.8 (2010): 2067-2079.

[26] Hu, Yi, and Philipos C. Loizou. "Evaluation of objective quality measures for speech en-hancement." IEEE Transactions on audio, speech, and language processing 16.1 (2007): 229-238.

[27] Taal, Cees H., et al. "An algorithm for intelligibility prediction of time–frequency weighted noisy speech." IEEE Transactions on Audio, Speech, and Language Processing 19.7 (2011): 2125-2136.

[28] http://www.pal-acoustics.com/index.php?a=services&id=143&lang=cn
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊