跳到主要內容

臺灣博碩士論文加值系統

(3.236.68.118) 您好!臺灣時間:2021/07/31 20:02
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:李翰
研究生(外文):HanLi
論文名稱:口述對話中應用融合交互關聯模型於互動風格之偵測
論文名稱(外文):Interaction Style Detection Based on Fused Cross-Correlation Model In Spoken Conversation
指導教授:吳宗憲吳宗憲引用關係
指導教授(外文):Chung-Hsien Wu
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:66
中文關鍵詞:互動風格融合交互關聯模型
外文關鍵詞:Interaction StyleFused Cross-Correlation ModelFCCM
相關次數:
  • 被引用被引用:0
  • 點閱點閱:201
  • 評分評分:
  • 下載下載:15
  • 收藏至我的研究室書目清單書目收藏:0
中文摘要
李翰* 吳宗憲**
國立成功大學資訊工程學系
本論文主要提出一種多模態模型融合技術-融合交互關聯模型(Fused Cross-Correlation Model, FCCM)來幫助我們有效的結合情緒、人格特質以及對話歷史資訊進而提升互動風格(Interaction Style, IS)偵測之準確率 。有了使用者當下的互動風格訊息後,我們可以更進一步的依據此訊息來解決傳統口語對話系統(spoken dialogue system, SDS)中單調回應的問題,使得對話系統的回應能更加多元化,而不只是局限於單一回應或是隨機選取預先制定的回應。有鑑於推測高階意涵之互動風格只透過低階語音特徵是不夠的,根據互動風格文獻的分析,本論文首先引入了情緒與人格特質之資訊,並且考慮到互動風格於對話中會是一種長時間的表現狀態,亦加入了對話中互動風格之歷史資訊來幫助我們更精確的推測出使用者當下的互動風格。透過語料的收集以及分析,我們統計出了互動風格、情緒、人格特質以及歷史資訊之間的交互關聯性,並依此建立了FCCM有效的整合各方資訊推論出使用者之互動風格。而在特徵擷取部分,我們除了考慮語音之韻律學特徵外,進一步考慮其對話內容即語意資訊來幫助我們能更準確偵測出人格特質以及互動風格。而考慮到我們所收集到的語料皆為自然連續對話語料,其中過大的情緒表現(arousal)會影響我們使用自動語音辨識器(Auto Speech Recognizer, ASR)辨識出文字之結果。為此我們訓練了情緒高昂與情緒平緩(high and low arousal)兩種語音辨識模型,透過情緒偵測的幫助,來提升自動語音辨識之準確率。在文字方面,透過潛在語意分析(Latent Semantic Analysis, LSA)來幫助我們取出文字特徵的潛在意涵,並結合韻律學特徵一起透過支持向量機(support vector machine, SVM)訓練出人格特質與互動風格之偵測模型。此外,我們提出了一種新的情緒辨識方法即考慮情緒在對話過程中的時間展現(Emotion Temporal Course Recognition),透過情緒文法規則來幫助我們對情緒做出更精準的偵測。最後結合情緒、人格特質、歷史資訊及初步互動風格的判斷結果,透過FCCM來對互動風格偵測之結果做更進一步的修正。實驗結果顯示,我們所提出的FCCM對於互動風格之辨識率為73.09%,相較於單純只使用韻律學以及語意特徵透過SVM分類器來辨識互動風格的辨識結果進步了11.21%。由實驗可知論文所提之方法在實際應用上能有明顯的效能提升,亦驗證了文獻之分析即互動風格與情緒及人格特質存在著顯著的關聯性。

關鍵字 - 互動風格 、融合交互關聯模型

*作者 **指導教授
Abstract
Interaction Style Detection Based on
Fused Cross-Correlation Model in Spoken Conversation
Han Li* Chung-Hsien Wu**

Department of Computer Science and Information Engineering,
National Cheng Kung University, Tainan, Taiwan, R.O.C.

In this thesis, a multi-modal fusion technology named Fused Cross-Correlation Model (FCCM) is proposed. The user’s emotion, personality trait and dialogue history information are combined for user's interaction style (IS) detection to deal with the problem of monotonic responses in a conventional spoken dialogue system (SDS). The responses generated from the SDS can be versatile based on user's interaction style instead of randomly selecting a pre-defined response, so that the conversations between human and system will be more dynamic and natural. It's not an easy task to detect IS based on the low level features, because IS includes some high level psychological meaning. Accordingly, emotion recognition and personality trait detection are employed in IS detection. Because IS is a long-term external expression, historical conversations' IS information is also integrated into IS detection. Finally, the cross correlation coefficients among emotion, personality trait and history information and IS are estimated using a training corpus for the FCCM construction. Moreover, prosodic features and linguistic features of users' speech are conducted in feature extraction. However, the emotional speech will inflect the performance of auto speech recognizer (ASR). To solve this problem, we separate our corpus into two categories according to the arousal level, and then use them to train the acoustic models of ASR separately. After this process, the result from ASR will be more robust. Then we apply Latent Semantic Analysis (LSA) to extract the latent semantic features, which compose the linguistic features. Support vector machine (SVM) is utilized to train personality trait detection model and IS scoring model. On the other hand, we propose a new method considering emotion temporal course to improve the emotion recognition accuracy. Finally, the result of emotion recognition, personality trait detection, IS scoring and historical IS will be combined and fused into FCCM to get the final IS detection result. Experimental results show that the performance of the proposed approach can achieve 73.09% accuracy, which is 11.21% better than SVM for IS detection. The results confirm that the correlations among IS, emotion and personality trait are beneficial to IS detection in a spoken dialogue system.

Keyword - Interaction Style, Fused Cross-Correlation Model, FCCM

* The Author ** The Advisor
Catalog
中文摘要 IV
Abstract VI
Chapter 1 - Introduction 1
1.1 Research Background 1
1.2 Related Work 2
1.3 Objective and Motivation 5
1.4 Research Methods Introduction 7
1.5 Chapter Introduction 8
Chapter 2 - Introduction of Interaction Style 10
2.1 Origin of Interaction Style 10
2.2 Definition of Interaction Style 11
2.3 Application of Interaction Style 13
2.4 Psychological Tendencies of Interaction Style 15
a. Emotional Tendencies 15
b. Personality Traits Tendencies 16
Chapter 3 - Corpus Collection 17
3.1 Personality Test 17
3.2 Corpus Designing 17
3.3 Corpus Recording 18
3.4 Corpus Tagging 18
Chapter 4 - Introduction of the System Architecture 20
4.1 Training Phase 20
a. ASR Model Training 21
b. Personality Detection Models Training 21
c. Interaction Style Scoring Model Training 23
d. Temporal Course Emotion Detection Model Training 23
e. Cross-Correlation Model Training 24
4.2 Verification 24
Chapter 5 - Interaction Style Detection 25
5.1 Interaction Style Detection based on FCCM 25
5.2 Temporal Phase Based Emotion Recognition 30
a. Temporal Course Emotion Detection Model Training 32
b. Emotion Recognition 33
5.3 Personality Trait Detection 34
a. Linguistic Feature Extraction and Model Training 35
b. Prosodic Feature Extraction and Model Training 38
c. Support Vector Machine(SVM) 40
d. Flow Path of Personality Trait Detection 42
5.4 Interaction Style Scoring 42
a. Linguistic Feature Extraction and Model Training 43
b. Prosodic Feature Extraction and Model Training 43
c. Flow Path of Interaction Style Scoring 44
Chapter 6 - Experiment 45
6.1 Experiment Design 45
6.2 Tools 47
6.3 ASR Recognition 47
6.4 Emotion Recognition 49
6.5 Personality Detection 50
6.6 Interaction Style Scoring 51
6.7 Interaction Style Detection Based on FCCM 53
Chapter 7 - Conclusions and Future Work 61
7.1 Conclusions 61
7.2 Future Work 62
References 63

References
[1]J. Weizenbaum, “Eliza – A computer program for the study of natural language communication between man and machine., Communications of the ACM 9(1):36-45, 1996.
[2]Richard S. Wallace, A.L.I.C.E [Online] Available: http://www.alicebot.org/logo-info.html, 1995
[3]Rollo Carpenter, Jabberwacky,
[Online] Available: http://www.jabberwacky.com, 1988.
[4]A.L.Gorin et al., “How May I Help You ?, Speech Communication, vol.23, pp.113-127, 1997.
[5]Victor Zue et al., “Jupiter: A Telepone-based Conversational Interface for Weather Information, IEEE Trans. Speech and Audio Processing, vol.8, no.1, pp.85-96, 2000.
[6]Teruhisa Misu, Tatsuya Kawahara, “Speech-based Interactive Information Guidance System Using Question-answering Technique, Proc. ICASSP 2007, vol.4, pp.145-147, 2007.
[7]Ryuichi Nisimura et al., “Public Speech-oriented Guidance System, Proc. ICASSP 2004, vol.1, pp.433-436, 2004.
[8]W.-B. Liang, C.-H. Wu, C.-H. Wang, and J.-F. Wang, “Interactional Style Detection for Versatile Dialogue Response Using Prosodic and Semantic Features, Proceedings of INTERSPEECH, Florence, Italy, pp. 27–31, Aug 2011.
[9]LaHaye, Tim , Your Temperament: Discover Its Potential, Tyndale Publishing. ISBN 0842362207, 1984.
[10]van Sertima, Ivan, The Golden Age of the Moor, Transaction Publishers. p. 17. ISBN 1560005815, 1992.
[11]Thomas G. Long, “Myers-Briggs and other Modern Astrologies, Theology Today 49 (3): 291–95, 1992.
[12]Linda V. Berens, “Understanding Yourself and Others: An Introduction to Interaction Styles, Telos Publications, 2001.
[13]Jurafsky, D., Ranganath, R., McFarland, D.,“Extracting social meaning: Identifying interactional style in spoken conversation, In: Proceedings of NAACL HLT, 2009.
[14]C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 2, 121–167, Kluwer Academic Publishers, Boston, 1998.
[15]Marston, William M, “Emotions of Normal People, K. Paul, Trench, Trubner & Co. ltd. pp. 405, 1928.
[16]Keirsey, David, “Please Understand Me II: Temperament, Character, Intelligence, Del Mar, CA: Prometheus Nemesis Book Company. ISBN 1-885705-02-6, 1998.
[17]Hans Eysenck., “Dimensions of Personality , 1947.
[18]Paul Boersma, David Weenink, “Praat: doing phonetics by computer, [Online] Available: http://www.fon.hum.uva.nl/praat/, 2001.
[19]Goldberg, L. R., “Language and individual differences: The search for universals in personality lexicons, In Wheeler (Ed.), Review of Personality and social psychology, Vol. 1, 141–165. Beverly Hills, CA: Sage, 1981.
[20]S. Deerwester, S. Dumais, T. Landauer, G. Furnas, and R. Harshman. “Indexing by latent semantic analysis, Journal of the American Society of Information Science, 41(6):391–407, 1990.
[21]Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., & Gough, H. C. “The International Personality Item Pool and the future of public-domain personality measures, Journal of Research in Personality, 40, 84-96, 2006.
[22]Chi-Chun Lee et al., “Emotion recognition using a hierarchical binary decision tree approach, Speech Commun., 2011.
[23]Chang, C.-C. and C.-J. Lin, “LIBSVM: a library for support vector machines,
[Online] available: http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001
[24]Yan Huang, Support Vector Machines for Text Categorization Based on Latent Semantic Indexing, 2001.
[25]S. Young, J. Jansen, J. Odell, D. Ollason, and P. Woodland,“The HTK Book, Cambridge Univ., 1996.
[26]Nadia Mana, Fabio Pianesi .“modelling of emotional facial expressions during speech in synthetic talking heads using a hybrid approach, AVSP2007.
[27]Ming-Lei Chen, Hsueh-Cheng Wang, Hwa-Wei Ko. “The construction and validation of Chinese Semantic Space by using Latent Semantic Analysis, 2009.
[28]C. M. Lee et al.,“Combining acoustic and language information for emotion recognition, In Proc. ICSLP, 2002.
[29]I. Kruijff-Korbayova and O. Kukina.,“The effect of dialogue system output style variation on users’ evaluation judgments and input style., In Proceedings of the 9th SIGdial Workshop on Discourse and Dialog, 2008.
[30]Chia-Hsin Hsieh, Chien-Lin Huang, Chung-Hsien Wu “Spoken Document Summarization Using Topic-Related Corpus and Semantic Dependency Grammar Chinese Spoken Language Processing, 2004 International Symposium on, 2004
[31]Cheongjae Lee, Sangkeun Jung, Jihyun Eun, Minwoo Jeong, Gary Geunbae Lee “A Situation-Based Dialogue Management using Dialogue Examples, Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, 2006.
[32]A. Pentland, “Social Dynamics: Signals and Behavior, Proc. Int’l Conf. Developmental Learning, IEEE Press, 2004..
[33]F. Mairesse, M. Walker, M. Mehl, and R. Moore, “Using Linguistic Cues for The Automatic Recognition of Personality in Conversation and Text, Journal of Artificial Intelligence Research, vol. 30, pp. 457–500, 2007.
[34]M. E. Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion recognition: features, classification schemes, Pattern Recognition, vol. 44, pp. 572-587, 2011.
[35]B. Schuller, G. Rigoll, M. Lang, “Hidden Markov model-based speech emotion recognition, Proceedings of ICASSP’03, pp. II 1–4 , 2003.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top