|
[1] Ahmad, Khan Suhail, et al. "A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network." Advances in Pattern Recognition (ICAPR), 2015 Eighth International Conference on. IEEE, 2015. [2] Wang, Yi, and Bob Lawlor. "Speaker recognition based on MFCC and BP neural networks." Signals and Systems Conference (ISSC), 2017 28th Irish. IEEE, 2017. [3] Zhao, Xiaojia, Yuxuan Wang, and DeLiang Wang. "Deep neural networks for cochannel speaker identification." Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015. [4] Hoshen, Yedid, Ron J. Weiss, and Kevin W. Wilson. "Speech acoustic modeling from raw multichannel waveforms." Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015. [5] Dai, Wei, et al. "Very deep convolutional neural networks for raw waveforms." Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017. [6] Sainath, Tara N., et al. "Learning the speech front-end with raw waveform CLDNNs." Sixteenth Annual Conference of the International Speech Communication Association. 2015. [7] T. S. Chi, class notes of Auditory and Acoustical Information Processing, Department of Communication Engineering, National Chiao-Tung University, Taiwan, 2013. [8] Morris, Andrew, Jean-Luc Schwartz, and Pierre Escudier. "An information theoretical investigation into the distribution of phonetic information across the auditory spectrogram." Computer Speech & Language 7.2: 121-136, 1993 [9] Humes, Larry E., and Lisa Roberts. "Speech-recognition difficulties of the hearing-impaired elderly: The contributions of audibility." Journal of Speech, Language, and Hearing Research 33.4: 726-735, 1990. [10] Moore, Brian CJ. "Perceptual consequences of cochlear hearing loss and their implications for the design of hearing aids." Ear and hearing 17.2: 133-161, 1996. [11] T. Chi, P. Ru, and S. A. Shamma, “Multi-resolution spectro- temporal analysis of complex sounds,” J. Acoust. Soc. Am., vol. 118, no. 2, pp. 887–906, 2005. [12] M. Elhilali, T. Chi, and S. Shamma, “A spectro-temporal modulation index (stmi) for assessment of speech intelligibility.” Speech Communication, pp. 331–348, 2003. [13] T.-S. Chi, T.-H. Lin, and C.-C. Hsu., “Spectro-temporal modulation energy based mask for robust speaker identification,” J. Acoust. Soc. Am., vol. 131, no. 5, pp. EL368–EL374, 2012. [14] T. E. Lin, C. C. Hsu, Y. C. Chen, J. H. Chen, and T. S. Chi, “Spectro-temporal modulation based singing detection combined with pitch-based grouping for singing voice separation,” in Proceedings of INTERSPEECH., pp. 2920–2923, 2013. [15] F. Yen, Y.-J. Luo, and T.-S. Chi, “Singing voice separation using spectro-temporal modulation features.” Proc. of Annual Conference of International Society for Music Information Retrieval (ISMIR), pp. 617–622, 2014. [16] J. B. Fritz, M. Elhilali, S. V. David, and S. A. Shamma, “Auditory attention-focusing the searchlight on sound,” Current opinion in neurobiology, vol. 17, no. 4, pp. 437– 455, 2007. [17] E. R. Hafter, A. Sarampalis, and P. Loui, “Auditory attention and filters,” Auditory perception of sound sources. Springer US, pp. 115–142, 2008. [18] M. Elhilali, J. Fritz, T. Chi, and S. Shamma, “Auditory cortical receptive fields: Stable entities with plastic abilities,” J. Neuroscience, vol. 27, no. 39, pp. 10 372– 10 382, 2007.
[19] Z. Q. Wang and D. Wang, “A joint training framework for robust automatic speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 4, pp. 796–806, 2016. [20] H. Sak, A. Senior, K. Rao, O. Irsoy, A. Graves, F. Beaufays, and J. Schalkwyk, “Learning acoustic frame labeling for speech recognition with recurrent neural networks,” in Proceedings of ICASSP, pp. 4280–4284, 2015. [21] D. S. Williamson, Y. Wang, and D. Wang, “Complex ratio masking for monaural speech separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 3, pp. 483–492, 2016. [22] L.-Y. Yeh and T.-S. Chi, “Spectro-temporal modulations for robust speech emotion recognition,” in Proceedings of INTERSPEECH, pp. 789–792, 2010. [23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, pp. 1097–1105, 2012. [24] J. Masci, U. Meier, D. Cirean, and J. Schmidhuber, “Stacked convolutional auto-encoders for hierarchical feature extraction,” In International Conference on Arti- ficial Neural Networks., pp. 52–59, 2011. [25] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak, “Convolutional, long short-term memory, fully connected deep neural networks.” in Proceedings of ICASSP., pp. 4580–4584, 2015. [26] O. Abdel-Hamid, A. R. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition.” in Proceedings of ICASSP., pp. 4277–4280, 2012. [27] Chen, Jing, Thomas Baer, and Brian CJ Moore. "Effect of enhancement of spectral changes on speech intelligibility and clarity preferences for the hearing impaired." The Journal of the Acoustical Society of America 131.4: 2987-2998, 2012 [28] Chi, Tai-Shih, and Chung-Chien Hsu. "Multiband analysis and synthesis of spectro-temporal modulations of Fourier spectrogram." The Journal of the Acoustical Society of America 129.5: EL190-EL196, 2011. [29] Y. Zhang, M. Pezeshki, P. Brakel, S. Zhang, C. Laurent, Y. Bengio, and A. Courville, “Towards end-to-end speech recognition with deep convolutional neural networks.” in Proceedings of INTERSPEECH., pp. 410– 414, 2016. [30] Z.-Q. Wang and D. Wang., “Robust speech recognition from ratio masks.” in Proceedings of ICASSP., pp. 5720– 5724, 2016.
|