|
[1] B. E. Kingsbury, N. Morgan, and S. Greenberg, “Robust speech recognition using the modulation spectrogram,” Speech communication, vol. 25, no. 1, pp. 117–132, 1998. [2] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Transactions on speech and audio processing, vol. 10, no. 5, pp. 293–302, 2002. [3] F. Morchen, A. Ultsch, M. Thies, and I. Lohken, “Modeling timbre distance with temporal statistics from polyphonic music,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 1, pp. 81–90, 2006. [4] D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, “Detection and classification of acoustic scenes and events,” IEEE Transactions on Multimedia, vol. 17, no. 10, pp. 1733–1746, 2015. [5] D. Barchiesi, D. Giannoulis, D. Stowell, and M. D. Plumbley, “Acoustic scene classification: Classifying environments from the sounds they produce,” IEEE Signal Processing Magazine, vol. 32, no. 3, pp. 16–34, 2015. [6] M. H. Moattar and M. M. Homayounpour, “A review on speaker diarization systems and approaches,” Speech Communication, vol. 54, no. 10, pp. 1065–1103, 2012. [7] M. Mckinney and J. Breebaart, “Features for audio and music classification,” in Proceedings of the International Symposium on Music Information Retrieval, pp. 151–158, 2003. [8] M. A. Hossan, S. Memon, and M. A. Gregory, “A novel approach for mfcc feature extraction,” in Signal Processing and Communication Systems (ICSPCS), 2010 4th International Conference on, pp. 1–5, IEEE, 2010. [9] S. Greenberg and B. E. Kingsbury, “The modulation spectrogram: In pursuit of an invariant representation of speech,” in Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, vol. 3, pp. 1647–1650, IEEE, 1997. [10] S. Sukittanon, L. E. Atlas, and J. W. Pitton, “Modulation-scale analysis for content identification,” IEEE Transactions on Signal Processing, vol. 52, no. 10, pp. 3023–3035, 2004. [11] 何育澤, “基於支持向量機之混合聲響辦認,” 國立清華大學, 2014 年. [12] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443–445, 1985. [13] A. Oppenheim and R. Schafer, Discrete-time Signal Processing. Prentice-Hall signal processing series, Pearson, 2010. [14] H. Hermansky, “Modulation spectrum in speech processing,” in Signal Analysis and Prediction, pp. 395–406, Springer, 1998. [15] M. Markaki and Y. Stylianou, “Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features,” Speech Communication, vol. 53, no. 5, pp. 726–735, 2011. [16] L. Atlas and S. A. Shamma, “Joint acoustic and modulation frequency,” EURASIP Journal on Advances in Signal Processing, vol. 2003, no. 7, p. 310290, 2003. [17] G. Evangelopoulos and P. Maragos, “Multiband modulation energy tracking for noisy speech detection,” IEEE Transactions on audio, speech, and language processing, vol. 14, no. 6, pp. 2024–2038, 2006. [18] J.-H. Bach, B. Kollmeier, and J. Anemüller, “Modulation-based detection of speech in real background noise: Generalization to novel background classes,” in Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 41–44, IEEE, 2010. [19] N. H. Sephus, A. D. Lanterman, and D. V. Anderson, “Exploring frequency modulation features and resolution in the modulation spectrum,” in Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE), 2013 IEEE, pp. 169–174, IEEE, 2013. [20] J.-M. Ren and J.-S. R. Jang, “Discovering time-constrained sequential patterns for music genre classification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 4, pp. 1134–1144, 2012. [21] C.-H. Lee, J.-L. Shih, K.-M. Yu, and H.-S. Lin, “Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features,” IEEE Transactions on Multimedia, vol. 11, no. 4, pp. 670–682, 2009. [22] M. Markaki and Y. Stylianou, “Voice pathology detection and discrimination based on modulation spectral features,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 1938–1948, 2011. [23] S.-C. Lim, S.-J. Jang, S.-P. Lee, and M. Y. Kim, “Music genre/mood classification using a feature-based modulation spectrum,” in Mobile IT Convergence (ICMIC), 2011 International Conference on, pp. 133–136, IEEE, 2011. [24] D. Reynolds, “Gaussian mixture models,” Encyclopedia of biometrics, pp. 827–832, 2015. [25] D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using gaussian mixture speaker models,” IEEE transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72–83, 1995. [26] K. B. Petersen, M. S. Pedersen, et al., “The matrix cookbook,” Technical University of Denmark, vol. 7, p. 15, 2008. [27] A. B. Downey, Think complexity: complexity science and computational modeling, ch. 9, p. 91. O’Reilly Media, Inc., 2012.
|