|
[1] S. Narayanan and P. G. Georgiou, "Behavioral signal processing: Deriving human behavioral informatics from speech and language," Proceedings of the IEEE, vol. 101, pp. 1203-1233, 2013. [2] S.-W. Hsiao, H.-C. Sun, M.-C. Hsieh, M.-H. Tsai, H.-C. Lin, and C.-C. Lee, "A multimodal approach for automatic assessment of school principals' oral presentation during pre-service training program," in Sixteenth Annual Conference of the International Speech Communication Association, 2015. [3] M. P. Black, A. Katsamanis, B. R. Baucom, C.-C. Lee, A. C. Lammert, A. Christensen, et al., "Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features," Speech Communication, vol. 55, pp. 1-21, 2013. [4] H.-Y. Chen, Y.-H. Liao, H.-T. Jan, L.-W. Kuo, and C.-C. Lee, "A Gaussian mixture regression approach toward modeling the affective dynamics between acoustically-derived vocal arousal score (VC-AS) and internal brain fMRI bold signal response," in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, 2016, pp. 5775-5779. [5] W.-C. Chen, P.-T. Lai, Y. Tsao, and C.-C. Lee, "Multimodal arousal rating using unsupervised fusion technique," in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, 2015, pp. 5296-5300. [6] A. Metallinou, Z. Yang, C.-c. Lee, C. Busso, S. Carnicke, and S. Narayanan, "The USC CreativeIT database of multimodal dyadic interactions: From speech and full body motion capture to continuous emotional annotations," Language resources and evaluation, vol. 50, pp. 497-521, 2016. [7] F.-S. Tsai, Y.-L. Hsu, W.-C. Chen, Y.-M. Weng, C.-J. Ng, and C.-C. Lee, "Toward Development and Evaluation of Pain Level-Rating Scale for Emergency Triage based on Vocal Characteristics and Facial Expressions," in INTERSPEECH, 2016, pp. 92-96. [8] D. Bone, C.-C. Lee, A. Potamianos, and S. S. Narayanan, "An investigation of vocal arousal dynamics in child-psychologist interactions using synchrony measures and a conversation-based model," in Fifteenth Annual Conference of the International Speech Communication Association, 2014. [9] E. Delaherche, M. Chetouani, F. Bigouret, J. Xavier, M. Plaza, and D. Cohen, "Assessment of the communicative and coordination skills of children with autism spectrum disorders and typically developing children using social signal processing," Research in Autism Spectrum Disorders, vol. 7, pp. 741-756, 2013. [10] D. Bone, C.-C. Lee, M. P. Black, M. E. Williams, S. Lee, P. Levitt, et al., "The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody," Journal of Speech, Language, and Hearing Research, vol. 57, pp. 1162-1177, 2014. [11] R. L. Spitzer and J. B. Williams, "Diagnostic and statistical manual of mental disorders," in American Psychiatric Association, 1980. [12] E. K. Delinicolas and R. L. Young, "Joint attention, language, social relating, and stereotypical behaviours in children with autistic disorder," Autism, vol. 11, pp. 425-436, 2007. [13] J. Baio, "Prevalence of Autism Spectrum Disorders: Autism and Developmental Disabilities Monitoring Network, 14 Sites, United States, 2008. Morbidity and Mortality Weekly Report. Surveillance Summaries. Volume 61, Number 3," Centers for Disease Control and Prevention, 2012. [14] C. Lord, S. Risi, L. Lambrecht, E. H. Cook, B. L. Leventhal, P. C. DiLavore, et al., "The Autism Diagnostic Observation Schedule—Generic: A standard measure of social and communication deficits associated with the spectrum of autism," Journal of autism and developmental disorders, vol. 30, pp. 205-223, 2000. [15] C. Lord, M. Rutter, and A. Le Couteur, "Autism Diagnostic Interview-Revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders," Journal of autism and developmental disorders, vol. 24, pp. 659-685, 1994. [16] P. Boersma and D. Weenink, "Praat-A system for doing phonetics by computer [Computer Software]," The Netherlands: Institute of Phonetic Sciences, University of Amsterdam, 2003. [17] R. Paul, A. Augustyn, A. Klin, and F. R. Volkmar, "Perception and production of prosody by speakers with autism spectrum disorders," Journal of autism and developmental disorders, vol. 35, pp. 205-220, 2005. [18] D. Ververidis and C. Kotropoulos, "Emotional speech recognition: Resources, features, and methods," Speech communication, vol. 48, pp. 1162-1181, 2006. [19] B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, et al., "librosa: Audio and music signal analysis in python," in Proceedings of the 14th python in science conference, 2015, pp. 18-25. [20] P. Boersma, "Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound," in Proceedings of the institute of phonetic sciences, 1993, pp. 97-110. [21] M. K. Sönmez, L. Heck, M. Weintraub, E. Shriberg, M. Kemal, S. Larry, et al., "A lognormal tied mixture model of pitch for prosody-based speaker recognition," 1997. [22] M. Farrús, "Jitter and shimmer measurements for speaker recognition," in 8th Annual Conference of the International Speech Communication Association; 2007 Aug. 27-31; Antwerp (Belgium).[place unknown]: ISCA; 2007. p. 778-81., 2007. [23] H. Wang, A. Kläser, C. Schmid, and C.-L. Liu, "Action recognition by dense trajectories," in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 2011, pp. 3169-3176. [24] H. Wang and C. Schmid, "Action recognition with improved trajectories," in Proceedings of the IEEE international conference on computer vision, 2013, pp. 3551-3558. [25] A. Tamrakar, S. Ali, Q. Yu, J. Liu, O. Javed, A. Divakaran, et al., "Evaluation of low-level features and their combinations for complex event detection in open source videos," in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 3681-3688. [26] J. Sun, Y. Mu, S. Yan, and L.-F. Cheong, "Activity recognition using dense long-duration trajectories," in Multimedia and Expo (ICME), 2010 IEEE International Conference on, 2010, pp. 322-327. [27] L. Baraldi, F. Paci, G. Serra, L. Benini, and R. Cucchiara, "Gesture recognition in ego-centric videos using dense trajectories and hand segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014, pp. 688-693. [28] F. Rosenblatt, "The perceptron: A probabilistic model for information storage and organization in the brain," Psychological review, vol. 65, p. 386, 1958. [29] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Cognitive modeling, vol. 5, p. 1, 1988. [30] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," science, vol. 313, pp. 504-507, 2006. [31] R. J. Williams and D. Zipser, "A learning algorithm for continually running fully recurrent neural networks," Neural computation, vol. 1, pp. 270-280, 1989. [32] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, pp. 1735-1780, 1997. [33] Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult," IEEE transactions on neural networks, vol. 5, pp. 157-166, 1994. [34] S. Petridis and M. Pantic, "Deep complementary bottleneck features for visual speech recognition," in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, 2016, pp. 2304-2308. [35] J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, "Image classification with the fisher vector: Theory and practice," International journal of computer vision, vol. 105, pp. 222-245, 2013. [36] C. Cortes and V. Vapnik, "Support vector machine," Machine learning, vol. 20, pp. 273-297, 1995. [37] B. Waske and J. A. Benediktsson, "Fusion of support vector machines for classification of multisensor data," IEEE Transactions on Geoscience and Remote Sensing, vol. 45, pp. 3858-3866, 2007. [38] D. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
|