( 您好!臺灣時間:2021/03/08 04:58
字體大小: 字級放大   字級縮小   預設字形  


研究生(外文):Huang, Wen-Yu
論文名稱(外文):Enhancement of Automatic Assessment System for Pre-service Principals’ Oral Presentation using Speech Attribute-enriched Multi-level Feature
指導教授(外文):Lee, Chi-Chun
口試委員(外文):Tsao, YuChen, Yi-ShinChen, Yun-Nung
外文關鍵詞:behavior signal processingeducational researchoral presentationnatural language processingself-defined attribute tagtopic model
  • 被引用被引用:0
  • 點閱點閱:116
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
Among increasing needs of domain-aware computational models that can perform large-scale assessment like domain experts, the development of automatic oral presentation assessment system is important for education researchers. In this work, we extend the previous audiovisual framework on pre-service school principals’ 3-minute long impromptu speech using lexical information as additional modality. We aim at exploring effective feature set for text and enhancing the performance of lexical modality by manual tagging information. First, we utilize multi-level feature extraction approach, which consists of distributed representations and word categories, to derive features from the transcripts in the 2014 National Academy for Educational Research (NAER) oral presentation database, and improve the result of lexical modality from Spearman correlation of 0.378 to 0.493. Furthermore, inspired by folksonomy, we propose to enhance lexical feature by using a self-defined attribute tags of speech transcripts. Therefore, we carry out two different experiments: Exp I) considering the tags as other labels and employing multi-label learning, and Exp II) feature inspired by tags and topic modeling. After incorporating the two methods, the improved system obtains Spearman correlation of 0.574. Our experiment demonstrates the concept of self-defined attribute tags has capability to enrich lexical modality and improve system.
授權書 ii
論文指導教授推薦函 iii
學位考試委員會審定書 iv
誌謝 v
中文摘要 vi
Abstract vii
目錄 viii
表目錄 x
圖目錄 xi
第一章 緒論 1
1.1 前言 1
1.2 研究動機 2
1.3 論文架構 3
第二章 資料庫 5
2.1 候用校長即席演講語料庫 5
2.2 正規化評分與演講屬性標註 6
2.2.1 排序標籤正規化 6
2.2.2 演講屬性標註 8
第三章 研究方法 9
3.1 經典向量空間模型 9
3.2 分布式表示法 10
3.2.1 詞向量 11
3.2.2 文章向量 14
3.3 多標籤學習 15
3.3.1 聯合特徵學習 17
3.3.2 格拉姆矩陣組合 17
3.4 主題模型 18
3.4.1 隱含狄利克雷分布 18
3.4.2 潛在主題向量化 20
第四章 實驗設計、結果與分析 21
4.1 前置實驗:多層次文章特徵 21
4.1.1 實驗概念與設計 21
4.1.2 結果與討論 22
4.2 實驗壹:標籤式演講屬性標註資訊 25
4.2.1 實驗概念與設計 25
4.2.2 結果與討論 26
4.3 實驗貳:特徵式演講屬性標註資訊 31
4.3.1 實驗概念與設計 31
4.3.2 結果與探討 32
4.4  綜合式屬性標註資訊之結果 34
第五章 結論 35
參考文獻 37
附錄 45
[1] S. Narayanan and P. G. Georgiou, "Behavioral signal processing: Deriving human behavioral informatics from speech and language," Proceedings of the IEEE, vol. 101, no. 5, pp. 1203-1233, 2013.
[2] A. Tsanas, M. A. Little, P. E. McSharry, J. Spielman, and L. O. Ramig, "Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease," IEEE Transactions on Biomedical Engineering, vol. 59, no. 5, pp. 1264-1271, 2012.
[3] X. Zhu, H.-I. Suk, and D. Shen, "A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis," NeuroImage, vol. 100, pp. 91-105, 2014.
[4] J. Gibson et al., "A Deep Learning Approach to Modeling Empathy in Addiction Counseling," Commitment, vol. 111, p. 21, 2016.
[5] B. Xiao, C. Huang, Z. E. Imel, D. C. Atkins, P. Georgiou, and S. S. Narayanan, "A technology prototype system for rating therapist empathy from audio recordings in addiction counseling," PeerJ Computer Science, vol. 2, p. e59, 2016.
[6] B. Xiao, Z. E. Imel, P. G. Georgiou, D. C. Atkins, and S. S. Narayanan, "" Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing," PloS one, vol. 10, no. 12, p. e0143055, 2015.
[7] P. M. Faye et al., "Newborn infant pain assessment using heart rate variability analysis," The Clinical journal of pain, vol. 26, no. 9, pp. 777-782, 2010.
[8] F.-S. Tsai, Y.-L. Hsu, W.-C. Chen, Y.-M. Weng, C.-J. Ng, and C.-C. Lee, "Toward Development and Evaluation of Pain Level-Rating Scale for Emergency Triage based on Vocal Characteristics and Facial Expressions," Interspeech 2016, pp. 92-96, 2016.
[9] M. P. Black et al., "Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features," Speech Communication, vol. 55, no. 1, pp. 1-21, 2013.
[10] M. F. Jung, "Coupling Interactions and Performance: Predicting Team Performance from Thin Slices of Conflict," ACM Transactions on Computer-Human Interaction (TOCHI), vol. 23, no. 3, p. 18, 2016.
[11] C.-C. Lee et al., "Computing vocal entrainment: A signal-derived PCA-based quantification scheme with application to affect analysis in married couple interactions," Computer Speech & Language, vol. 28, no. 2, pp. 518-539, 2014.
[12] D. Bone, M. S. Goodwin, M. P. Black, C.-C. Lee, K. Audhkhasi, and S. Narayanan, "Applying machine learning to facilitate autism diagnostics: pitfalls and promises," Journal of autism and developmental disorders, vol. 45, no. 5, pp. 1121-1136, 2015.
[13] D. Bone et al., "The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody," Journal of Speech, Language, and Hearing Research, vol. 57, no. 4, pp. 1162-1177, 2014.
[14] D. Wall, J. Kosmicki, T. Deluca, E. Harstad, and V. Fusaro, "Use of machine learning to shorten observation-based screening and diagnosis of autism," Translational psychiatry, vol. 2, no. 4, p. e100, 2012.
[15] A. Metallinou, Z. Yang, C.-c. Lee, C. Busso, S. Carnicke, and S. Narayanan, "The USC CreativeIT database of multimodal dyadic interactions: from speech and full body motion capture to continuous emotional annotations," Language resources and evaluation, vol. 50, no. 3, pp. 497-521, 2016.
[16] Z. Yang, A. Metallinou, E. Erzin, and S. Narayanan, "Analysis of interaction attitudes using data-driven hand gesture phrases," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, 2014, pp. 699-703: IEEE.
[17] A. F. A. Khan, O. Mourad, A. M. K. B. Mannan, H. B. A. M. Dahan, and M. A. Abushariah, "Automatic Arabic pronunciation scoring for computer aided language learning," in Communications, Signal Processing, and their Applications (ICCSPA), 2013 1st International Conference on, 2013, pp. 1-6: IEEE.
[18] S. E. Petersen and M. Ostendorf, "A machine learning approach to reading level assessment," Computer speech & language, vol. 23, no. 1, pp. 89-106, 2009.
[19] S. M. Witt and S. J. Young, "Phone-level pronunciation scoring and assessment for interactive language learning," Speech communication, vol. 30, no. 2, pp. 95-108, 2000.
[20] S.-W. Hsiao, H.-C. Sun, M.-C. Hsieh, M.-H. Tsai, H.-C. Lin, and C.-C. Lee, "A Multimodal Approach for Automatic Assessment of School Principals' Oral Presentation During Pre-Service Training Program," in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
[21] W.-Y. Huang, S.-W. Hsiao, H.-C. Sun, M.-C. Hsieh, M.-H. Tsai, and C.-C. Lee, "Enhancement of Automatic Oral Presentation Assessment System Using Latent N-Grams Word Representation and Part-of-Speech Information," Interspeech 2016, pp. 1432-1436, 2016.
[22] Y. Cheong Cheng, Y.-Q. Mao, W. Yan, and L. Catherine Ehrich, "Principal preparation and training: a look at China and its issues," International Journal of Educational Management, vol. 23, no. 1, pp. 51-64, 2009.
[23] D. L. Keith, "Principal desirabilitiy for professional development," Academy of Educational Leadership Journal, vol. 15, no. 2, p. 95, 2011.
[24] P. S. Keung, "Continuing professional development of principals in Hong Kong," Frontiers of Education in China, vol. 2, no. 4, pp. 605-619, 2007.
[25] P. S. Salazar, "The professional development needs of rural high school principals: A seven-state study," The Rural Educator, vol. 28, no. 3, 2007.
[26] S. Watson, T. Miller, L. Johnston, and V. Rutledge, "Professional development school graduate performance: Perceptions of school principals," The Teacher Educator, vol. 42, no. 2, pp. 77-86, 2006.
[27] B. R. Baucom and E. Iturralde, "A behaviorist manifesto for the 21 st century," in Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific, 2012, pp. 1-4: IEEE.
[28] G. Margolin et al., "The nuts and bolts of behavioral observation of marital and family interaction," Clinical child and family psychology review, vol. 1, no. 4, pp. 195-213, 1998.
[29] J. Burstein, J. Tetreault, and N. Madnani, "The e-rater automated essay scoring system," Handbook of automated essay evaluation: Current applications and new directions, pp. 55-67, 2013.
[30] D. S. McNamara, S. A. Crossley, R. D. Roscoe, L. K. Allen, and J. Dai, "A hierarchical classification approach to automated essay scoring," Assessing Writing, vol. 23, pp. 35-59, 2015.
[31] L. Streeter, J. Bernstein, P. Foltz, and D. DeLand, "Pearson’s automated scoring of writing, speaking, and mathematics," ed: Pearson White Paper. Iowa City, IA: Pearson. Retrieved from http://www.pearsonassessments.com/hai/images/tmrs/PearsonsAutomatedScoringofWritingSpeakingandMathematics.pdf, 2011.
[32] M. Chatterjee, S. Park, L.-P. Morency, and S. Scherer, "Combining two perspectives on classifying multimodal data for recognizing speaker traits," in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, 2015, pp. 7-14: ACM.
[33] D. Higgins, X. Xi, K. Zechner, and D. Williamson, "A three-stage approach to the automated scoring of spontaneous spoken responses," Computer Speech & Language, vol. 25, no. 2, pp. 282-306, 2011.
[34] I. Naim, M. I. Tanveer, D. Gildea, and M. E. Hoque, "Automated prediction and analysis of job interview performance: The role of what you say and how you say it," in Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on, 2015, vol. 1, pp. 1-6: IEEE.
[35] I. Naim, M. I. Tanveer, D. Gildea, and E. Hoque, "Automated analysis and prediction of job interview performance," IEEE Transactions on Affective Computing, 2016.
[36] L. S. Nguyen, D. Frauendorfer, M. S. Mast, and D. Gatica-Perez, "Hire me: Computational Inference of Hirability in Employment Interviews Based on Nonverbal Behavior," IEEE Transactions on Multimedia, vol. 16, no. 4, pp. 1018-1031, 2014.
[37] L. S. Nguyen and D. Gatica-Perez, "I would hire you in a minute: Thin slices of nonverbal behavior in job interviews," in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, 2015, pp. 51-58: ACM.
[38] D. A. Silverstein and T. Zhang, "System and method of providing evaluation feedback to a speaker while giving a real-time oral presentation," ed: Google Patents, 2006.
[39] O. Kang, "Impact of rater characteristics and prosodic features of speaker accentedness on ratings of international teaching assistants' oral performance," Language Assessment Quarterly, vol. 9, no. 3, pp. 249-269, 2012.
[40] L. Chen, C. W. Leong, G. Feng, and C. M. Lee, "Using multimodal cues to analyze MLA'14 oral presentation quality corpus: Presentation delivery and slides quality," in Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge, 2014, pp. 45-52: ACM.
[41] M. Gentilucci and M. C. Corballis, "From manual gesture to speech: a gradual transition," Neuroscience & Biobehavioral Reviews, vol. 30, no. 7, pp. 949-960, 2006.
[42] D. McNeill, How language began: Gesture and speech in human evolution. Cambridge University Press, 2012.
[43] S. Scherer, G. Layher, J. Kane, H. Neumann, and N. Campbell, "An audiovisual political speech analysis incorporating eye-tracking and perception data," in LREC, 2012, pp. 1114-1120.
[44] A. Rosenberg and J. Hirschberg, "Acoustic/prosodic and lexical correlates of charismatic speech," in INTERSPEECH, 2005, pp. 513-516.
[45] M. Barthet, G. Fazekas, and M. Sandler, "Multidisciplinary perspectives on music emotion recognition: Implications for content and context-based models," Proc. CMMR, pp. 492-507, 2012.
[46] M. P. Black, P. G. Georgiou, A. Katsamanis, B. R. Baucom, and S. Narayanan, "“You made me do it”: Classification of Blame in Married Couples' Interactions by Fusing Automatically Derived Speech and Language Information," in Twelfth Annual Conference of the International Speech Communication Association, 2011.
[47] A. Kazemzadeh, S. Lee, and S. Narayanan, "Fuzzy logic models for the meaning of emotion words," IEEE Computational intelligence magazine, vol. 8, no. 2, pp. 34-49, 2013.
[48] H. D. Kim, C. Zhai, and J. Han, "Aggregation of multiple judgments for evaluating ordered lists," in European Conference on Information Retrieval, 2010, pp. 166-178: Springer.
[49] J. San Pedro and S. Siersdorfer, "Ranking and classifying attractiveness of photos in folksonomies," in Proceedings of the 18th international conference on World wide web, 2009, pp. 771-780: ACM.
[50] J. Tang, H.-f. Leung, Q. Luo, D. Chen, and J. Gong, "Towards Ontology Learning from Folksonomies," in IJCAI, 2009, vol. 9, pp. 2089-2094.
[51] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
[52] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, "A neural probabilistic language model," Journal of machine learning research, vol. 3, no. Feb, pp. 1137-1155, 2003.
[53] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in neural information processing systems, 2013, pp. 3111-3119.
[54] T. Mikolov, W.-t. Yih, and G. Zweig, "Linguistic Regularities in Continuous Space Word Representations," in Hlt-naacl, 2013, vol. 13, pp. 746-751.
[55] F. Morin and Y. Bengio, "Hierarchical Probabilistic Neural Network Language Model," in Aistats, 2005, vol. 5, pp. 246-252: Citeseer.
[56] R. Johnson and T. Zhang, "Effective use of word order for text categorization with convolutional neural networks," arXiv preprint arXiv:1412.1058, 2014.
[57] R. Johnson and T. Zhang, "Semi-supervised convolutional neural networks for text categorization via region embedding," in Advances in neural information processing systems, 2015, pp. 919-927.
[58] Y. Kim, "Convolutional neural networks for sentence classification," arXiv preprint arXiv:1408.5882, 2014.
[59] Q. V. Le and T. Mikolov, "Distributed Representations of Sentences and Documents," in ICML, 2014, vol. 14, pp. 1188-1196.
[60] Z. Zhang, P. Luo, C. C. Loy, and X. Tang, "Facial landmark detection by deep multi-task learning," in European Conference on Computer Vision, 2014, pp. 94-108: Springer.
[61] B. Jie, D. Zhang, B. Cheng, and D. Shen, "Manifold regularized multitask feature learning for multimodality disease classification," Human brain mapping, vol. 36, no. 2, pp. 489-507, 2015.
[62] Y. Luo, D. Tao, B. Geng, C. Xu, and S. J. Maybank, "Manifold regularized multitask learning for semi-supervised multilabel image classification," IEEE Transactions on Image Processing, vol. 22, no. 2, pp. 523-536, 2013.
[63] M.-T. Luong, Q. V. Le, I. Sutskever, O. Vinyals, and L. Kaiser, "Multi-task sequence to sequence learning," arXiv preprint arXiv:1511.06114, 2015.
[64] A. Argyriou, T. Evgeniou, and M. Pontil, "Convex multi-task feature learning," Machine Learning, vol. 73, no. 3, pp. 243-272, 2008.
[65] J. Liu, S. Ji, and J. Ye, "Multi-task feature learning via efficient l 2, 1-norm minimization," in Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, 2009, pp. 339-348: AUAI Press.
[66] G. Obozinski, B. Taskar, and M. Jordan, "Multi-task feature selection," Statistics Department, UC Berkeley, Tech. Rep, vol. 2, 2006.
[67] I. Bíró, J. Szabó, and A. A. Benczúr, "Latent dirichlet allocation in web spam filtering," in Proceedings of the 4th international workshop on Adversarial information retrieval on the web, 2008, pp. 29-32: ACM.
[68] M. Lienou, H. Maitre, and M. Datcu, "Semantic annotation of satellite images using latent Dirichlet allocation," IEEE Geoscience and Remote Sensing Letters, vol. 7, no. 1, pp. 28-32, 2010.
[69] J. D. Mcauliffe and D. M. Blei, "Supervised topic models," in Advances in neural information processing systems, 2008, pp. 121-128.
[70] R. Das, M. Zaheer, and C. Dyer, "Gaussian LDA for Topic Models with Word Embeddings," in ACL (1), 2015, pp. 795-804.
[71] D. Q. Nguyen, R. Billingsley, L. Du, and M. Johnson, "Improving topic models with latent feature word representations," Transactions of the Association for Computational Linguistics, vol. 3, pp. 299-313, 2015.
[72] C. E. Moody, "Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec," arXiv preprint arXiv:1605.02019, 2016.
[73] L. Niu, X. Dai, J. Zhang, and J. Chen, "Topic2Vec: learning distributed representations of topics," in Asian Language Processing (IALP), 2015 International Conference on, 2015, pp. 193-196: IEEE.
[74] Y. Liu, Z. Liu, T.-S. Chua, and M. Sun, "Topical Word Embeddings," in AAAI, 2015, pp. 2418-2424.
[75] J. W. Pennebaker, C. K. Chung, M. Ireland, A. Gonzales, and R. J. Booth, "The Development and Psychometric Properties of LIWC2007."
[76] Y. R. Tausczik and J. W. Pennebaker, "The psychological meaning of words: LIWC and computerized text analysis methods," Journal of language and social psychology, vol. 29, no. 1, pp. 24-54, 2010.
[77] M. del Pilar Salas-Zárate, E. López-López, R. Valencia-García, N. Aussenac-Gilles, Á. Almela, and G. Alor-Hernández, "A study on LIWC categories for opinion mining in Spanish reviews," Journal of Information Science, vol. 40, no. 6, pp. 749-760, 2014.
[78] C.-L. Huang et al., "The development of the Chinese linguistic inquiry and word count dictionary," Chinese Journal of Psychology, vol. 54, no. 2, pp. 185-201, 2012.
[79] H. Jegou, F. Perronnin, M. Douze, J. Sánchez, P. Perez, and C. Schmid, "Aggregating local image descriptors into compact codes," IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 9, pp. 1704-1716, 2012.
[80] F. Eyben, M. Wöllmer, and B. Schuller, "Opensmile: the munich versatile and fast open-source audio feature extractor," in Proceedings of the 18th ACM international conference on Multimedia, 2010, pp. 1459-1462: ACM.
[81] H. Wang and C. Schmid, "Action recognition with improved trajectories," in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 3551-3558.
[82] F. Perronnin, J. Sánchez, and T. Mensink, "Improving the fisher kernel for large-scale image classification," Computer Vision–ECCV 2010, pp. 143-156, 2010.
[83] Y. Sun, Y. Chen, X. Wang, and X. Tang, "Deep learning face representation by joint identification-verification," in Advances in neural information processing systems, 2014, pp. 1988-1996.
[84] Y. Kim, H. Lee, and E. M. Provost, "Deep learning for robust feature generation in audiovisual emotion recognition," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 2013, pp. 3687-3691: IEEE.
[85] H. Wang, H. Huang, and C. Ding, "Multi-label feature transform for image classifications," Computer Vision–ECCV 2010, pp. 793-806, 2010.
[86] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching word vectors with subword information," arXiv preprint arXiv:1607.04606, 2016.
[87] Y. Pan, T. Mei, T. Yao, H. Li, and Y. Rui, "Jointly modeling embedding and translation to bridge video and language," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4594-4602.
[88] K. Cho et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
[89] S. Arora, Y. Liang, and T. Ma, "A simple but tough-to-beat baseline for sentence embeddings," 2016.
[90] R. Kiros et al., "Skip-thought vectors," in Advances in neural information processing systems, 2015, pp. 3294-3302.
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔