|
References [1] World Health Organization. World Health Statistics Overview 2019: Monitoring Health for the Sustainable Development Goals (SDGs). 2019. Available online: https://apps.who.int/iris/bitstream/handle/10665/311696/WHO-DAD-2019.1-eng.pdf (accessed on 8 August 2022). [2] Ministry of Health and Welfare. 2018 Taiwan Health and Welfare Report. 2018. Available online: https://www.mohw.gov.tw/cp-137-47558-2.html (accessed on 8 August 2022). [3] N. Sadoughi and C. Busso. “Speech-driven animation with meaningful behaviors,” Speech Communication, vol. 110, pp. 90-100, 2019. [4] C.-M. Huang and B. Mutlu. “Learning-based modeling of multimodal behaviors for humanlike robots,” in Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, Bielefeld, Germany, 2014, pp. 57-64. [5] M. Kipp. Gesture Generation by Imitation: From Human Behavior to Computer Character Animation, Universal-Publishers: Irvine, California, 2005. [6] S. Levine, P. Krahenbuhl, S. Thrun, and V. Koltun. “Gesture controllers,” Transactions on Graphics, vol. 29, no. 4, pp. 1-11, 2010. [7] Y. Ferstl, M. Neff, and R. McDonnell. “Multi-Objective adversarial gesture generation,” in Proceedings of the ACM SIGGRAPH Conference on Motion, Interaction and Games, Newcastle Upon Tyne, United Kingdom, 2019, pp. 1-10. [8] S. Ginosar, A. Bar, G. Kohavi, C. Chan, A. Owens, and J. Malik. “Learning individual styles of conversational gesture,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, California, 2019, pp. 3497-3506. [9] S. Alexanderson, G. E. Henter, T. Kucherenko, and J. Beskow. “Style-controllable speech-driven gesture synthesis using normalizing flows,” Computer Graphics Forum, vol. 39, no. 2, pp. 487-496, 2020. [10] Y. Yoon, W.-R. Ko, M. Jang, J. Lee, J. Kim, and G. Lee. “Robots learn social skills: end-to-end learning of co-speech gesture generation for humanoid robots,” in Proceedings of the IEEE International Conference on Robotics and Automation, Montreal, Canada, 2019, pp. 4303-4309. [11] T. Kucherenko, P. Jonell, S. vanWaveren, G. E. Henter, S. Alexanderson, I. Leite, and H. Kjellstrom. “Gesticulator: a framework for semantically-aware speech-driven gesture generation,” in Proceedings of the ACM International Conference on Multimodal Interaction, Utrecht, Netherlands, 2020, pp. 242-250. [12] A. B Hostetter and A. L Potthoff. “Effects of personality and social situation on representational gesture production,” Gesture, vol. 12, no. 1, pp. 62-83, 2012. [13] T. Baltrušaitis, C. Ahuja, and L.-P. Morency. “Multimodal machine learning: a survey and taxonomy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 423-443, 2018. [14] C. Ahuja and L.-P. Morency. “Language2Pose: natural language grounded pose forecasting,” in Proceedings of the IEEE International Conference on 3D Vision, Quebec City, Canada, 2019, pp. 719-728. [15] M. Roddy, G. Skantze, and N. Harte. “Multimodal continuous turn-taking prediction using multiscale RNNs,” in Proceedings of the ACM International Conference on Multimodal Interaction, Boulder, Colorado, 2018, pp. 186-190. [16] D. Bahdanau, K. Cho, and Y. Bengio. “Neural machine translation by jointly learning to align and translate,” arXiv:1409.0473, 2015. [17] A. Aristidou, E. Stavrakis, P. Charalambous, Y. Chrysanthou, and S. Loizidou Himona. “Folk dance evaluation using laban movement analysis,” Computing and Cultural Heritage, vol. 8, no. 4, pp. 1-19, 2015. [18] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. “Improved techniques for training GANs,” in Proceedings of the Conference on Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 2234-2242. [19] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. “GANs trained by a two time-scale update rule converge to a local nash equilibrium,” in Proceedings of the Conference on Neural Information Processing Systems, Long Beach, California, 2017, pp. 6626-6637. [20] K. Kilgour, M. Zuluaga, D. Roblek, and M. Sharifi. “Frechet audio distance: a metric for evaluating music enhancement algorithms,” arXiv:1812.08466, 2018. [21] L. Medsker and C. J. Jain. Recurrent neural networks: Design, and Applications, CRC Press: Boca Raton, California, 1999. [22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. “Attention is all you need,” in Proceedings of the Conference on Neural Information Processing Systems, Long Beach, California, 2017, pp. 30. [23] J. Hu, L. Shen, and G. Sun. “Squeeze-and-excitation networks,” In Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UTAH, 2018, pp. 7132-7141. [24] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” arXiv:1810.04805, 2018. [25] G. Ian, P.-A. Jean, M. Mehdi, B. Xu, D.-F. SherjilOzair, A. Courville, and Y. Bengio. “Generative adversarial nets,” in Proceedings of the Conference on Neural Information Processing Systems, Montreal, Canada, 2014, 2672-2680. [26] Y. Yoon, B. Cha, J.-H. Lee, M. Jang, J. Lee, J. Kim, and G. Lee. “Speech gesture generation from the trimodal context of text, audio, and speaker identity,” Transactions on Graphics, vol. 39, no. 6, pp. 1-16, 2020. [27] U. Bhattacharya, E. Childs, N. Rewkowski, and D. Manocha. “Speech2affectivegestures: synthesizing co-speech gestures with generative adversarial affective expression learning,” in Proceedings of the ACM International Conference on Multimedia, Chengdu, China, 2021, pp. 2027-2036.
|