|
[1] J. Y. Li, L. Deng, Y. F. Gong, and R. Haeb-Umbach, "An Overview of Noise-Robust Automatic Speech Recognition," (in English), Ieee-Acm Transactions on Audio Speech and Language Processing, vol. 22, no. 4, pp. 745-777, Apr 2014. [2] J. Ming, T. J. Hazen, J. R. Glass, and D. A. Reynolds, "Robust speaker recognition in noisy conditions," (in English), Ieee Transactions on Audio Speech and Language Processing, vol. 15, no. 5, pp. 1711-1723, Jul 2007. [3] L. P. Yang and Q. J. Fu, "Spectral subtraction-based speech enhancement for cochlear implant patients in background noise," (in English), Journal of the Acoustical Society of America, vol. 117, no. 3, pp. 1001-1004, Mar 2005. [4] M. Berouti, R. Schwartz, and J. Makhoul, "Enhancement of speech corrupted by acoustic noise," in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP'79., 1979, vol. 4, pp. 208-211: IEEE. [5] J. S. Lim and A. V. Oppenheim, "All-Pole Modeling of Degraded Speech," (in English), Ieee Transactions on Acoustics Speech and Signal Processing, vol. 26, no. 3, pp. 197-210, 1978. [6] Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," (in English), Ieee Transactions on Acoustics Speech and Signal Processing, vol. 32, no. 6, pp. 1109-1121, 1984. [7] K. W. Wilson, B. Raj, P. Smaragdis, and A. Divakaran, "Speech denoising using nonnegative matrix factorization with priors," (in English), 2008 Ieee International Conference on Acoustics, Speech and Signal Processing, Vols 1-12, pp. 4029-+, 2008. [8] J. Tchorz and B. Kollmeier, "SNR estimation based on amplitude modulation analysis with application's to noise suppression," (in English), Ieee Transactions on Speech and Audio Processing, vol. 11, no. 3, pp. 184-192, May 2003. [9] X. G. Lu, Y. Tsao, S. Matsuda, and C. Hori, "Speech Enhancement Based on Deep Denoising Autoencoder," (in English), 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), Vols 1-5, pp. 436-440, 2013. [10] Y. Xu, J. Du, L. R. Dai, and C. H. Lee, "A Regression Approach to Speech Enhancement Based on Deep Neural Networks," (in English), Ieee-Acm Transactions on Audio Speech and Language Processing, vol. 23, no. 1, pp. 7-19, Jan 2015. [11] S. W. Fu, Y. Tsao, and X. G. Lu, "SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement," (in English), 17th Annual Conference of the International Speech Communication Association (Interspeech 2016), Vols 1-5, pp. 3768-3772, 2016. 47 [12] A. L. Maas, Q. V. Le, T. M. O'Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, "Recurrent Neural Networks for Noise Reduction in Robust ASR," (in English), 13th Annual Conference of the International Speech Communication Association 2012 (Interspeech 2012), Vols 1-3, pp. 22-25, 2012. [13] F. Weninger et al., "Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR," in International Conference on Latent Variable Analysis and Signal Separation, 2015, pp. 91-99: Springer. [14] P. G. Shivakumar and P. G. Georgiou, "Perception Optimized Deep Denoising AutoEncoders for Speech Enhancement," in INTERSPEECH, 2016, pp. 3743-3747. [15] S.-W. Fu, T.-W. Wang, Y. Tsao, X. Lu, and H. Kawai, "End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks," IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol. 26, no. 9, pp. 1570-1584, 2018. [16] Yan Zhao, Buye Xu, Ritwik Giri, and T. Zhang, "Perceptually guided speech enhancement using deep neural networks," 2018 Ieee International Conference on Acoustics, Speech and Signal Processing (Icassp), pp. 5074-5078, 2018. [17] Morten Kolbæk, Zheng-Hua Tan, and J. Jensen, "Monaural speech enhancement using deep neural networks by maximizing a short-time objective intelligibility measure," 2018 Ieee International Conference on Acoustics, Speech and Signal Processing (Icassp), pp. 5059-5063, 2018. [18] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "An algorithm for intelligibility prediction of time–frequency weighted noisy speech," IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125-2136, 2011. [19] Y. Hu and P. C. Loizou, "Evaluation of objective quality measures for speech enhancement," (in English), Ieee Transactions on Audio Speech and Language Processing, vol. 16, no. 1, pp. 229-238, Jan 2008. [20] Y. Koizumi, K. Niwa, Y. Hioka, K. Kobayashi, and Y. Haneda, "Dnn-Based Source Enhancement Self-Optimized by Reinforcement Learning Using Sound Quality Measurements," (in English), 2017 Ieee International Conference on Acoustics, Speech and Signal Processing (Icassp), pp. 81-85, 2017. [21] P. C. Loizou, Speech enhancement: theory and practice. CRC press, 2007. [22] G. Heinzel, A. Rüdiger, and R. Schilling, "Spectrum and spectral density estimation by the Discrete Fourier transform (DFT), including a comprehensive list of window functions and some new at-top windows," 2002. [23] S. S. Stevens, J. Volkmann, and E. B. Newman, "A scale for the measurement of the psychological magnitude pitch," The Journal of the Acoustical Society of America, vol. 8, no. 3, pp. 185-190, 1937. 48 [24] E. Zwicker, "Subdivision of the audible frequency range into critical bands (Frequenzgruppen)," The Journal of the Acoustical Society of America, vol. 33, no. 2, pp. 248-248, 1961. [25] D. Baby, T. Virtanen, J. F. Gemmeke, and H. van Hamme, "Coupled Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech Recognition," (in English), Ieee-Acm Transactions on Audio Speech and Language Processing, vol. 23, no. 11, pp. 1788-1799, Nov 2015. [26] F. Weninger, J. R. Hershey, J. Le Roux, and B. Schuller, "Discriminatively Trained Recurrent Neural Networks for Single-Channel Speech Separation," (in English), 2014 Ieee Global Conference on Signal and Information Processing (Globalsip), pp. 577-581, 2014. [27] D. L. Wang, "On ideal binary mask as the computational goal of auditory scene analysis," (in English), Speech Separation by Humans and Machines, pp. 181-197, 2005. [28] A. Narayanan and D. L. Wang, "Ideal Ratio Mask Estimation Using Deep Neural Networks for Robust Speech Recognition," (in English), 2013 Ieee International Conference on Acoustics, Speech and Signal Processing (Icassp), pp. 7092-7096, 2013. [29] D. S. Williamson, Y. X. Wang, and D. L. Wang, "Complex Ratio Masking for Joint Enhancement of Magnitude and Phase," (in English), 2016 Ieee International Conference on Acoustics, Speech and Signal Processing Proceedings, pp. 5220-5224, 2016. [30] D. Silver et al., "Mastering the game of Go with deep neural networks and tree search," (in English), Nature, vol. 529, no. 7587, pp. 484-+, Jan 28 2016. [31] V. Mnih et al., "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013. [32] C. J. C. H. Watkins and P. Dayan, "Q-learning," Machine Learning, journal article vol. 8, no. 3, pp. 279-292, May 01 1992. [33] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 1998. [34] V. Mnih et al., "Human-level control through deep reinforcement learning," (in English), Nature, vol. 518, no. 7540, pp. 529-533, Feb 26 2015. [35] F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription," in Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on, 2011, pp. 24-29: IEEE. 49 [36] Y. Xu, J. Du, L. R. Dai, and C. H. Lee, "An Experimental Study on Speech Enhancement Based on Deep Neural Networks," (in English), Ieee Signal Processing Letters, vol. 21, no. 1, pp. 65-68, Jan 2014. [37] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, "DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1," NASA STI/Recon technical report n, vol. 93, 1993. [38] J. Barker, R. Marxer, E. Vincent, and S. Watanabe, "The third ‘CHiME’speech separation and recognition challenge: Dataset, task and baselines," in Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on, 2015, pp. 504-511: IEEE. [39] I. Rec, "P. 800: Methods for subjective determination of transmission quality," International Telecommunication Union, Geneva, 1996. [40] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ) - A new method for speech quality assessment of telephone networks and codecs," (in English), 2001 Ieee International Conference on Acoustics, Speech, and Signal Processing, Vols I-Vi, Proceedings, pp. 749-752, 2001. [41] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "A short-time objective intelligibility measure for time-frequency weighted noisy speech," in Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, 2010, pp. 4214-4217: IEEE. [42] Alex Irpan, "Deep Reinforcement Learning Doesn't Work Yet," 2018.
|