( 您好!臺灣時間:2021/05/16 13:48
字體大小: 字級放大   字級縮小   預設字形  


論文名稱(外文):Real Mood Detection Using Denoising Autoencoder and LSTM
指導教授(外文):Chung-Hsien Wu
外文關鍵詞:speech emotion recognitionlong-term emotion trackingmood detectiondenoising autoencoderlong short term memory
  • 被引用被引用:0
  • 點閱點閱:486
  • 評分評分:
  • 下載下載:70
  • 收藏至我的研究室書目清單書目收藏:0
考量在情緒表現上,人可能同時存在著多種情緒的表現,因此本論文以支持向量機分類(SVM)為基底建立情緒剖面預測模型,可以表示各種情緒之分布。接著,使用具自我標記的語料庫來考量在情緒中他人感知與自我表達的差異來建立高斯分布,藉此分布產生雜訊以建立更多的輸入資料,並將自我表達的情緒設定為目標資料,以訓練降噪自動編碼器(Denosing Autoencoder)進行情緒轉換模型之建立。最後考量情緒與時間上的關係,藉由具有記憶單元的長短期記憶模型(LSTM)以建構出情緒的歷史軌跡,藉此來達到心情偵測之目的

In a rapidly changing social environment, emotions are more and more difficult to handle for human beings. Sometimes people do not even know that they have negative emotions. As a result, the accumulation of negative emotions become a mental illness. Thus, it is essential to develop an emotion tracking system to help users manage their emotions. In current study, an extended subjective self-report method is generally used for measuring emotions.
Even though it is commonly accepted that the emotion perceived by the listener is close to the intended emotion conveyed by the speaker, several research indicated that there still remains a mismatch between them. In addition, the individuals with different personalities generally have different expressed emotions. Based on this investigation, this thesis proposes an emotion conversion model which characterizes the relationship between the perceived emotion and the expressed emotion of the user for a specific personality. Emotion conversion from perceived to expressed emotions is applied based on the personality traits of the user. This thesis considers mood swing as a long-term accumulation of emotions. A database containing user’s long-term speech data and mood annotation is collected. This database is used for constructing the temporal relationships between emotion and mood.
In order to reflect the real mood from people, an SVM-based emotion model is developed to generate multiple probabilistic class labels. Moreover, a Gaussian distribution is built to generate noisy data since there is a difference between expressed and perceived emotions. The input is the expressed emotion value contaminated by the generated noise and the target is the expressed emotion for denoising autoencoder (DAE) training. Finally, for modeling the temporal fluctuation of emotions, a long short-term memory (LSTM)-based mood model is constructed for mood detection.
In mood detection experiments, the mood database was provided by 10 participants. There were 104 positive moods and 96 negative moods. Leave-one-speaker-out cross validation was employed for evaluation. Experimental results show that the proposed method achieved a detection accuracy of 64.5%, which improves 5%, comparing to the HMM-based method. In the future, the tracking of the dialog content and blog of the users can be applied to obtain a better performance.

中文摘要 I
Abstract III
誌謝 V
Table of Contents VI
List of Tables IX
List of Figures X
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Background 2
1.3 Literature Review 3
1.3.1 Emotional Speech Databases 3
1.3.2 Emotion Perception and Emotion Expression 3
1.3.3 Emotion and Personality Trait 5
1.3.4 Long-term Tracking 7
1.4 Problem and Goal 8
1.5 The Organization of this Thesis 9
Chapter 2 Emotional Database Design and Collection 10
2.1 Emotion with Personality Database (EP-DB) 10
2.1.1 Data Collection 11
2.1.2 Emotional Video Selection 14
2.1.3 Environment 15
2.1.4 Data Annotation 16
2.2 Long-Term Emotion Database (LT-DB) 17
2.2.1 Data Collection 18
2.2.2 Environment 19
2.2.3 Data Annotation 20
2.3 MHMC Emotion Database 21
Chapter 3 Proposed Method 22
3.1 Speech Preprocessing 23
3.2 Emotion Profile Prediction 25
3.3 Emotion Conversion with Personality 26
3.3.1 Training Data Construction 26
3.3.2 Denoising Autoencoder 29
3.4 Long-term tracking and Mood Detection 33
Chapter 4 Experimental Results and Discussion 38
4.1 Database Analysis 38
4.2 System Performance 40
4.2.1 Emotion Profile Prediction 40
4.2.2 Emotion Conversion 41
4.2.3 Mood Detection 45
4.3 Performance Comparison 47
Chapter 5 Conclusions and Future Work 49
Reference 50

1]M. Reddy, “Depression: the disorder and the burden, Indian journal of psychological medicine, vol. 32, no. 1, pp. 1, 2010.
[2]“Google Ventures Investments 2015 Year in Review, 2015.
[3]S. P. Robbins, Organizational behavior, 14/E: Pearson Education India, 2001.
[4]V. Petrushin, Emotion in speech: Recognition and application to call centers.
[5]E. Douglas-Cowie, R. Cowie, and M. Schröder, A new emotion database: considerations, sources and scope.
[6]N. Amir, S. Ron, and N. Laor, Analysis of an emotional speech corpus in Hebrew based on objective criteria.
[7]F. Yu, E. Chang, Y.-Q. Xu, and H.-Y. Shum, Emotion detection from speech to enrich multimedia content. pp. 550-557.
[8]F. Schiel, S. Steininger, and U. Türk, The SmartKom Multimodal Corpus at BAS.
[9]F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss, A database of German emotional speech. pp. 1517-1520.
[10]C. Busso, and S. S. Narayanan, The expression and perception of emotions: comparing assessments of self versus others. pp. 257-260.
[11]K. P. Truong, M. A. Neerincx, and D. A. Van Leeuwen, Assessing agreement of observer-and self-annotations in spontaneous multimodal emotion data. pp. 318-321.
[12]C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan, “IEMOCAP: Interactive emotional dyadic motion capture database, Language resources and evaluation, vol. 42, no. 4, pp. 335-359, 2008.
[13]R. R. McCrae, and O. P. John, “An introduction to the five‐factor model and its applications, Journal of personality, vol. 60, no. 2, pp. 175-215, 1992.
[14]S. Kshirsagar, A multilayer personality model. pp. 107-115.
[15]Personality-Central, “Extroversion-Introversion preferences.
[16]A. P. Association, Diagnostic and statistical manual of mental disorders (DSM-5®): American Psychiatric Pub, 2013.
[17]E.-H. Jang, B.-J. Park, S.-H. Kim, and J.-H. Sohn, Emotion classification based on physiological signals induced by negative emotions: Discriminantion of negative emotions by machine learning algorithm. pp. 283-288.
[18]A. Gaggioli, P. Cipresso, S. Serino, and G. Riva, “Psychophysiological correlates of flow during daily activities, Stud. Health Technol. Inform, vol. 191, pp. 65-69, 2013.
[19]E. Mostafa, A. Farag, A. Shalaby, A. Ali, T. Gault, and A. Mahmoud, Long term facial parts tracking in thermal imaging for uncooperative emotion recognition. pp. 1-6.
[20]L. Zhong, Y. Li, X. Wei, G. Li, Z. Wang, and Y. Jiang, System design for monitoring infant speech emotion. pp. 952-955.
[21]K.-Y. Lam, J. Wang, J. K.-Y. Ng, S. Han, L. Zheng, C. H. C. Kam, and C. J. Zhu, “SmartMood: Toward Pervasive Mood Tracking and Analysis for Manic Episode Detection, IEEE Transactions on Human-Machine Systems, vol. 45, no. 1, pp. 126-131, 2015.
[22]R. F. Dickerson, E. I. Gorlin, and J. A. Stankovic, Empath: a continuous remote emotional health monitoring system for depressive illness. p. 5.
[23]K.-h. Chang, D. Fisher, and J. Canny, “Ammon: A speech analysis library for analyzing affect, stress, and mental health on mobile phones, Proceedings of PhoneSense, vol. 2011, 2011.
[24]R. Wiseman, 59 Seconds: Motivation: Think A Little, Change A Lot: Pan Macmillan, 2011.
[25]T. Giannakopoulos, “A method for silence removal and segmentation of speech signals, implemented in Matlab, University of Athens, Athens, vol. 2, 2009.
[26]F. Eyben, M. Wöllmer, and B. Schuller, Opensmile: the munich versatile and fast open-source audio feature extractor. pp. 1459-1462.
[27]E. Mower, and S. Narayanan, A hierarchical static-dynamic framework for emotion classification. pp. 2372-2375.
[28]C. Cortes, and V. Vapnik, “Support-vector networks, Machine learning, vol. 20, no. 3, pp. 273-297, 1995.
[29]E. Mower, M. J. Mataric, and S. Narayanan, “A framework for automatic human emotion classification using emotion profiles, IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 5, pp. 1057-1070, 2011.
[30]P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders. pp. 1096-1103.
[31]S. Hochreiter, and J. Schmidhuber, “Long short-term memory, Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[32]M. Wöllmer, A. Metallinou, N. Katsamanis, B. Schuller, and S. Narayanan, Analyzing the memory of BLSTM neural networks for enhanced emotion classification in dyadic spoken interactions. pp. 4157-4160.

註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top