跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.17) 您好!臺灣時間:2025/09/03 19:25
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:黃冠傑
研究生(外文):Kuan-ChiehHuang
論文名稱:基於計算智慧技術之多模情緒辨識方法
論文名稱(外文):Multi-modal Emotion Recognition Method based on Computational Intelligence Techniques
指導教授:郭耀煌郭耀煌引用關係
指導教授(外文):Yau-Hwang Kuo
學位類別:博士
校院名稱:國立成功大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2014
畢業學年度:102
語文別:英文
論文頁數:115
中文關鍵詞:情感溝通人機介面情緒辨識合作決策類神經網路基因演算法
外文關鍵詞:Affective communicationHuman-Computer InteractionEmotion RecognitionCollaborative Decision MakingNeural NetworkGenetic Algorithm
相關次數:
  • 被引用被引用:0
  • 點閱點閱:249
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
情感溝通是一種必要的日常生活互動的技能,其中情緒辨識更是一個不可或缺的能力。然而,這依舊是一個具有挑戰性的問題,因為人們常透過數種含糊的資訊與高度複雜的辨識程序,進而理解他人的情緒。在目前人機介面中,尚未能完全模擬人類辨識情緒的機制,但可藉由多個透過不同資訊所訓練而成的虛擬專家,形成協同合作的情緒辨識。其中每位虛擬專家執行特定的程序,如藉由不同的生理訊號或分類方法來辨識情緒,如此,每位虛擬專家對不同的情緒或生理訊號,會有其不同的權威性與特長。對於情緒資料複雜的分布,我們提出了混和型的神經網路以匹配複雜的資料分布分類問題,並且我們也提出一種多模的合作模型,結合不同的生理訊號與分類方法,以提高情緒辨識的準確率。這個合作模型採用多階段式決策,並透過基因演算法作為此模型之學習演算法。在合作模型的第一個階段,當生理訊號輸入後,每位已訓練好的虛擬專家利用自身的技術輸出對所有可辨識的情緒之機率值,接著每位虛擬專家的輸出,透過經由各虛擬專家之特性而學習出的轉換函數,在合作模型第二階段重新校正其輸出。最後,在合作模型的第三階段,藉由已學習的權值計算所有虛擬專家對每種情緒的機率值。在實驗的部分,我們採用機器學習測試資料庫與兩個公開的情緒資料庫驗證提出的模型,在資料的前處理上,我們採用以特徵為基礎與表徵為基礎的方法擷取相關臉部特徵,並在語音上透過音高、能量、語速與梅爾頻率倒頻譜係數作為語音訊號的特徵,這些特徵分別用來訓練各種採用不同資訊或分類方法的虛擬專家。實驗結果顯示,提出的合作模型藉由結合不同資訊與特長的虛擬專家,進而可以得到更好情緒辨識結果。
Affective communication is an essential daily interaction skill, which is also desired for use in digital artifacts. To learn this skill, digital artifacts need to build a prerequisite ability of emotion recognition. However, it is still a challenging issue because people recognize emotions using a highly complex process influenced by many vague factors. The technical solutions in digital world have not yet been able to fully mimic such process. This study therefore proposes a collaborative emotion recognition method that is realized by multiple virtual experts who recognize human emotions from different perspectives (feature set). Each virtual expert is an autonomous agent implementing a specific recognition technique in the individual recognition stage and sharing its recognition result with others in the collaborative recognition stage. For making an unbiased and high-accuracy collaborative decision, the proposed method first performs a reputed equalization process for all individual recognition results from virtual experts. The basis of this process is constructed by a genetic learning procedure on the reputations and decision styles of virtual experts. The equalized results are then aggregated and compromised to obtain the final decision according to the authority of virtual experts. To verify the proposed approach, the numerical data, the machine learning benchmark database, the audio-visual eNTERFACE’05 emotion database, and the Berlin database of emotional speech are used. In experiments, virtual experts respectively adopt full or partial combinations of the geometric feature-based and appearance-based facial features, pitch and energy of voice, speed of speech and Mel-Frequency Cepstrum Coefficients (MFCCs) are regarded as features whose full or partial combinations are referenced by the virtual experts participating in the emotion recognition experiments in this study. The experimental results show that our approach is able to make better group decision.
LIST OF TABLES VIII
LIST OF FIGURES IX
CHAPTER 1. INTRODUCTION 1
1.1. ISSUES AND CHALLENGES 3
1.2. MOTIVATION AND CONTRIBUTIONS 7
1.3. ORGANIZATION 8
CHAPTER 2. MULTI-MODAL FEATURE EXTRACTION 10
2.1. FACIAL FEATURE 10
2.1.1. Modified Active Shape Model (modified ASM) 10
2.1.2. Local Principal Texture Pattern (LPTP) 21
2.2. AUDIO FEATURE 22
2.2.1. Prosodic Feature 22
2.2.2. Mel-Frequency Cepstral Coefficients (MFCCs) 24
2.3. SEMI-SUPERVISED SELF-ORGANIZING MULTI-MODAL FEATURE MAP 25
CHAPTER 3. MULTI-MODAL EMOTION RECOGNITION METHODS 30
3.1. MLP-RBFN HYBRID NETWORK (MRHN) 30
3.1.1. Architecture of MRHN 34
3.1.2. Function of Hybrid Transfer Function 36
3.1.3. Learning Algorithm of MRHN 37
3.2. MINIMUM RISK NEURAL NETWORK (MRNN) 41
3.2.1. Architecture of MRNN 43
3.2.2. Learning Algorithm of MRNN 44
3.2.3. The relation between MRNN and weight decay technique 47
3.3. COLLABORATIVE DECISION MAKING MODEL 47
3.3.1. Virtual Expert Stage 48
3.3.2. Reputed Equalization Stage 50
3.3.3. Compromise Stage 53
3.3.4. Learning Algorithm of Collaborative Decision Making Model 54
CHAPTER 4. PERFORMANCE EVALUATION 61
4.1. NUMERICAL EXAMPLE 61
4.2. THE MACHINE LEARNING BENCHMARK DATASETS 69
4.3. THE AUDIO-VISUAL ENTERFACE’05 EMOTION DATABASE 83
4.4. THE BERLIN DATABASE OF EMOTIONAL SPEECH 89
CHAPTER 5. CONCLUSIONS AND FUTURE WORK 101
REFERENCES 103
AUTHOR’S PUBLICATIONS 114

[Ale06]P. S. Aleksic and A. K. Katsaggelos, “Audio-Visual Biometrics, Proceeding of the IEEE, Vol. 94, no. 11, pp. 2025-2044, 2006.
[And06]K. Anderson and P. W. McOwan, “A Real-Time Automated System for the Recognition of Human Facial Expressions, IEEE Transaction on Systems, Man, and Cybernetics part B, vol. 36, pp.96-105, 2006.
[Asu07]A. Asuncion and D. J. Newman, “UCI Machine Learning Repository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science, 2007.
[Ben09]Y. Bengio, “Learning Deep Architectures for AI, Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
[Bin07]M. H. Bindu, P. Gupta, U. S. Tiwary, “Cognitive Model-Based Emotion Recognition from Facial Expressions for Live Human Computer Interaction, IEEE Symposium on Computational Intelligence in Image and Signal Processing, pp. 351–356, 2007.
[Boo97]F. L. Bookstein, “Landmark Methods for Forms without Landmarks: Localizing Group Differences in Outline Shape, Medical Image Analysis, vol. 1, no. 3, pp. 225-243, 1997.
[Buc03]I. Buciu, C. Kotrropoulus and I. Pitas, “ICA and Gabor Representation for Facial Expression Recognition, in Proceedings of International Conference on Image Processing, vol.3, pp.855-858, 2003.
[Bug98]G. Bugmann, “Normalized Gaussian Radial Basis Function Networks, Neurocomputing, vol. 20, no. 1, pp. 97-110, 1998.
[Buh03]M. D. Buhmann, “Radial Basis Functions: Theory and Implementations, Cambridge University, 2003.
[Bur05]F. Burkhardt, Astrid Paeschke, Miriam Rolfes, Walter Sendlmeier and Benjamin Weiss “A Database of German Emotional Speech Proceedings Interspeech, 2005.
[Bur98]C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
[Bus04]Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan, “Analysis of Emotion Recognition Using Facial Expressions, Speech and Multimodal Information, in Proceedings of International Conference on Multimodal Interfaces (ICMI), pp. 205-211, 2004.
[Che91]S. Chen, C. F. N. Cowan, and P. M. Grant, “Orthogonal Least Squares Learning Algorithm for Radial Basis Function Networks, IEEE Transactions on Neural Networks, vol. 2, no. 2, pp. 302-309, 1991.
[Che96]C. T. Chen and W. D. Chang, “A Feedforward Neural Network with Function Shape Automating, Neural Networks, vol. 9, no. 4, pp. 627-641, 1996.
[Cir10]D. C. Ciresan, U. Meier, L. M. Gambardella, and J. Schmidhuber, “Deep Big Simple Neural Nets For Handwritten Digit Recognition, Neural Computation, vol. 22, no. 12, pp. 3207–3220, 2010.
[Cir12]D. C. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column Deep Neural Networks for Image Classification, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition CVPR, 2012.
[Coo01]T. F. Cootes and C. J Taylor, Statistical Models of Appearance for Computer Vision, Tech. Report, University of Manchester, Feb. 2001.
[Coo95]T. F. Cootes, G. J. Taylor, D. Cooper, and J. Graham, “Active Shape Models -Their Training and Application, Computer Vision and Image Understanding, vol. 61, no. 1, pp. 38-59, 1995.
[Cor95]F. Cortes and V. Vapnik, “Support Vector Networks, Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.
[Cow01]R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J.G. Taylor, “Emotion Recognition in Human-computer Interaction, IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 32–80, Jan. 2001.
[Dea12]S. J. Dean, G.S. Corrado, R. Monga, K. Chen, M. Devin, Q.V. Le, M.Z. Mao, M.A. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng, “Large Scale Distributed Deep Network, in Proceedings of Advances in Neural Information Processing Systems, 2012.
[Den05]H. B. Deng, L. W. Jin, L. X. Zhen, and J. C. Huang, “A New Facial Expression Recognition Method Based on Local Gabor Filter Bank and PCA plus LDA, International Journal of Information Technology, pp. 86-96, 2005.
[Dru99]H. Drucker, D. Wu, and V. Vapink, “Support Vector Machines for Spam Categorization, IEEE Transactions on Neural Networks, vol.10, no. 5, pp. 1048-1054, 1999.
[Eng96]I. S. Engberg, and A. V. Hansen, “Documentation of the Danish Emotional Speech Database (DES), Internal AAU report, Center for Person Kommunikation, Denmark, 1996.
[Esa07]N. Esau, E. Wetzel, L. Kleinjohann and B. Kleinjohann, “Real-Time Facial Expression Recognition Using a Fuzzy Emotion Model, IEEE International Conference on Fuzzy Systems, pp. 351-356, 2007.
[Fan05]R. E. Fan, P. H. Chen, and C. J. Lin, “Working Set Selection using Second Order Information for Training Support Vector Machines, The Journal of Machine Learning Research, vol. 6, pp. 1889 –1918, 2005.
[Fen04]X. Feng, “Facial Expression Recognition Based on Local Binary Patterns and Coarse-to-Fine Classification, in Proceedings of International Conference on Computer and Information Technology, pp. 178-183, 2004.
[Fuj04]M. Fujita, “On Activating Human Communications with Pet-type Robot AIBO, Proceeding of the IEEE, vol. 92, no. 11, pp. 1804-1813, 2004.
[Gra09]A. Graves and J. Schmidhuber, “Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks, in Proceedings of Advances in Neural Information Processing Systems 22 (NIPS'22), December 7th–10th, 2009.
[Gun07]H. Gunes and M. Piccardia, “Bi-modal Emotion Recognition from Expressive Face and Body Gestures, Journal of Network and Computer Applications, vol. 30, pp. 1334-1345, 2007.
[Han07]M. J. Han, J. H. Hsu, K. T. Song, and F. Y. Chang, “A New Information Fusion Method for SVM-Based Robotic Audio-Visual Emotion Recognition, in Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pp. 2656 – 2661, 2007.
[Har02]R. Hargrave, R. J. Maddock, and Valerie Stone, “Impaired Recognition of Facial Expressions of Emotion in Alzheimer's Disease, The Journal of Neuropsychiatry Clinical Neurosciences, no.14, pp.64-71, 2002.
[Hay09]S. Haykin, “Neural Networks and Learning Machines third ed., Prentice Hall, 2009.
[Hec89]R. Hecht-Nielsen, “Theory of The Back-Propagation Neural Network, in Proceedings of International Joint Conference on Neural Networks, Washington, DC, vol. 1, pp. 593-605, 1989.
[Hin93]G. E. Hinton, and D. Camp, “Keeping The Neural Networks Simple by Minimizing The Description Length Of The Weights, in Proceedings of the Sixth Annual Conference on Computational Learning Theory, pp. 5-13, 1993.
[Hoc05]S. Hoch, F. Althoff, G. McGlaun, and G. Rigoll, “Bimodal Fusion of Emotional Data in an Automotive Environment, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2005.
[Hor07]Y. Horikawa, Facial Expression Recognition using KCCA with Combining Correlation Kernels and Kansei Information, in Proceedings of International Conference on Computation Science and its Applications, pp. 489-498, 2007.
[Koh82]T. Kohonen, Self-Organized Formation of Topologically Correct Feature Maps, Biological Cybernetics, vol. 43, no. 1, pp. 59–69, 1982.
[Kro92]A. Krogh, and J. A. Hertz, “A Simple Weight Decay Can Improve Generalization, in Proceedings of Advances in Neural Information Processing Systems, pp. 450-957, San Mateo, CA, 1992.
[Le12]Q. V. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. S. Corrado, J. Dean, and A. Y. Ng. “Building High-Level Features using Large Scale Unsupervised Learning, “ in Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012.
[Lee08]J. Lee, Md. Z. Uddin, T.-S. Kim, “Spatiotemporal Human Facial Expression Recognition Using Fisher Independent Component Analysis and Hidden Markov Model, in Proceedings of IEEE International Conference on Engineering in Medicine and Biology Society (EMBS), 2008.
[Leo07]E. Leon, G. Clarke, V. Callaghan, and F. Sepulveda, “A User-independent Real-time Emotion Recognition System for Software Agents in Domestic Environments, Engineering Applications of Artificial Intelligence, vol. 20, pp. 337-345, 2007.
[Lia06]S. Liao, W. Fan, C. S. Chung, and D. Y. Yeung, “Facial Expression Recognition Using Advanced Local Binary Patterns, Tsallis Entropies and Global Appearance Features, in Proceedings of IEEE International Conference on Image Processing, pp. 665-668, 2006.
[Liu12]Y. Liu, O. Sourina, “EEG-based Dominance Level Recognition for Emotion-Enabled Interaction, in Proceedings of IEEE International Conference on Multimedia and Expo (ICME), 2012.
[Luo07]Q. Luo and H. Tan, “Facial and Speech Recognition Emotion in Distance Education System, in Proceedings of International Conference on Intelligent Pervasive Computing, pp. 483-486, 2007.
[Lv08]H. R. LV, Z. L. Lin, W. J. Yin, and J. Dong, “Emotion Recognition on Pressure Sensor Keyboard, in Proceedings of International Conference on Digital Object Identifier, pp. 1089-1092, 2008.
[Lyo98]M. J. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, “Coding Facial Expressions with Gabor Wavelets, in Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, pp. 200-205, 1998.
[Lyo99]M. Lyons, J. Budynek, and S. Akamastu, “Automatic Classification of Single Facial Images, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 21, pp. 1357-1362, 1999.
[Mar06]O. Martin, I. Kotsia, B. Macq and I. Pitas, “The eNTERFACE'05 Audio-visual Emotion Database, in Proceedings of International Conference on Data Engineering Workshops, 2006.
[Mar13]H. Martines, Y. Bengio, and G. N.Yannakakis, “Learning Deep Physiological Models of Affect, IEEE Computational Intelligence, vol. 8, no. 2, pp. 20-33, 2013.
[Met10]A. Metallinou, C. Busso, S. Lee, S. Narayanan, “Visual emotion recognition using compact facial representations and viseme information, in Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2010.
[Mic96]Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, third ed., Springer, Berlin, 1996.
[Ped08]W. Pedrycz, H. S. Park, and S. K. Oh, “A granular-oriented development of functional radial basis function neural networks, Neurocomputing, vol. 72, no. 1-3, pp. 420-435, 2008.

[Phi12]D. Philippou-Hubner, B.Vlasenko, R. Bock, and A. Wendemuth, “The Performance of the Speaking Rate Parameter in Emotion Recognition from Speech, in Proceedings of IEEE International Conference on Multimedia and Expo (ICME), 2012.
[Pic97]R. W. Picard, Affective Computing, MIT Press, 1997
[Rai09]R. Raina, A. Madhavan, and A. Y. Ng. Large-scale Deep Unsupervised Learning using Graphics Processors, in Proceedings of the 26th International Conference on Machine Learning, 2009.
[Riv12]R. Rivera, J. A. R. Castillo, and O. Chae, “Recognition of Face Expressions Using Local Principal Texture Pattern, in Proceedings of IEEE International Conference on Image Processing (ICIP), 2012.
[Sca98]B. Scassellati, “Eye Finding via Face Detection for a Foveated Active Vision System, in Proceedings of National Conference on Artificial Intelligence, pp. 969-976, 1998.
[Sen12]T. Senechal, V. Rapp, H. Salam, R. Seguier, K. Bailly, L. Prevost, “Facial Action Recognition Combining Heterogeneous Features via Multikernel Learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol.42, no.4, pp.993-1005, 2012.
[Sha09]C. Shan, S. Gong, and P. W. McOwan, “Facial Expression Recognition Based on Local Binary Patterns: A Comprehensive Study, Image and Vision Computing, pp. 803-816, 2009.
[Shi04]Y. Shinohara and N. Otsu, “Facial Expression Recognition Using Fisher Weight Maps, in Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, pp. 499-504, 2004
[Soc12]R. Socher, B. Huval, B. Bhat, C. D. Manning, and A. Y. Ng, “Convolutional-Recursive Deep Learning for 3D Object Classification, in Proceedings of Advances in Neural Information Processing Systems, 2012.
[Sol12]M. Soleymani, M. Pantic, and T. Pun, “Multimodal Emotion Recognition in Response to Videos, IEEE Transactions on Affective Computing, vol.3, no.2, pp. 211-223, 2012.
[Ste10]R. J. Sternberg, Cognitive Psychology, 5th ed., Cengage Learning, 2010.
[Tak08]K. Takahashi and I. Sugimoto, “Feasibility of Emotion Recognition from Breath Gas Information, in Proceedings of IEEE International Conference on Advanced Intelligent Mechatronics, pp. 625-630, 2008.
[Tre98]N. K. Treadgold, and T.D. Gedeon, “Simulated Annealing and Weight Decay in Adaptive Learning: The SARPROP Algorithm, IEEE Transactions on Neural Networks, vol.9, no.4, pp. 662-668, 1998.
[Tsi08]G. A. Tsihrintzis, M. Virvou, E. Alepis and I. O. Stathopoulou, “Towards Improving Visual-Facial Emotion Recognition through Use of Complementary Keyboard-Stroke Pattern Information, in Proceedings of International Conference on Information Technology, pp. 32-37, 2008.
[Vap95]V. N. Vapnik, “The Nature of Statistical Learning Theory, Springer-Verlag: New York, 1995.
[Ver06]Ververidis and C. Kotropoulos, “Emotional speech recognition: Resources, features, and methods, Speech Communication, vol. 48, no. 9, pp. 1162-1181, 2006.
[Vio01]P. Viola and M. J. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 511-518, 2001.
[Wan05]Y. Wang and L. Guan, “Recognizing Human Emotion from Audiovisual Information, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1125-1128, 2005.
[Wan12]Y. Wang, L. Guan, A. N. Venetsanopoulos, “Kernel Cross-Modal Factor Analysis for Information Fusion with Application to Bimodal Emotion Recognition, IEEE Transactions on Multimedia, vol. 14, no. 3, pp. 597-607, 2012.
[Wu96]L. Wu and J. Moody, “A Smoothing Regularizer for Feedforward and Recurrent Neural Networks, Neural Computation, vol. 8, no. 3, pp. 461-489, 1996.
[Yee01]P. V. Yee, and S. Haykin, Regularized Radial Basis Function Networks: Theory and Applications, John Wiley, 2001.
[Yos00]Y. Yoshitomi, S. Kim, T. Kawano, and T. Kitazoe, “Effect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face, in Proceedings of IEEE International Workshop on Robot and Human Interactive Communication, pp. 178-183, 2000.
[Zen07]Z. Zeng, J. Tu, M. Liu, T. S. Huang, B. Pianfetti, D. Roth, and S. Levinson, “Audio-Visual Affect Recognition, IEEE Transaction on Multimedia, vol. 9, no. 2, 2007.
[Zhi09]Z. Zhihong, M. Pantic, G. I. Roisman, and T. S. Huang, “A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, no.1, pp.39-58, 2009.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊