跳到主要內容

臺灣博碩士論文加值系統

(34.204.180.223) 您好!臺灣時間:2021/08/05 23:57
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:李智宏
研究生(外文):Chih-HungLi
論文名稱:基於新穎的音節切割及音節相關性之方法應用於笑聲偵測系統研究與實現
論文名稱(外文):Research and Implementation of the Laughter Detection System Based on a Novel Syllable Segmentation and Correlation Methods
指導教授:王駿發
指導教授(外文):Jhing-Fa Wang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電機工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:45
中文關鍵詞:自相關函數聲道轉換偵測器梅爾倒頻譜係數動態時間校正音高分析
外文關鍵詞:Laughter detectionAutocorrelation functionVocal tract transfer detectorMel-scale frequency cepstral coefficientDynamic time warpingpitch analysis
相關次數:
  • 被引用被引用:0
  • 點閱點閱:118
  • 評分評分:
  • 下載下載:5
  • 收藏至我的研究室書目清單書目收藏:0
在高科技的時代,人們的壓力愈來愈大,常常有憂鬱症的案例出現,會導致這個結果通常是因為對自己的壓力狀況沒有一定的了解,造成無法彌補的遺憾。笑聲偵測系統可以記錄自己的笑聲長度及次數來幫助人們了解自己。另外在語音辨識系統中,笑聲常常導致辨識率降低,笑聲偵測系統可以做為訊號的前處理來提升語音辨識系統的效能。
過去笑聲偵測或辨識的研究,主要為擷取笑聲的聲學特徵及使用有效的分類器做分類,常用的聲學特徵如梅爾倒頻譜系數及感知線性預測系數,而分類器常用隱藏式馬可夫模型及支援向量機,上述方法為了達到高的準確率訓練並建立完整的聲學模型是必要的,但因為笑聲種類過多且變異性過大所以要建立完整的模型是非常困難的。為了避免上述的問題,我們根據笑聲的自相關特性,提出一個語者不相關及低運算量且不需訓練的笑聲偵測系統,此系統利用改良的自相關函數以及聲道轉換偵測器去實現音節切割的演算法把一段聲音的所有音節分離開來,接著利用梅爾倒頻譜係數當作特徵並使用動態時間校正及音高趨勢的分析來來計算音節相關性的分數,最後系統判斷連續三個音節相關性高者為笑聲段。
我們提出的系統針對十位受測者,共一百五十句句子,可達到94%的準確率;在運算速度方面使用HTC one X作測試,平均一秒鐘可處理2.63秒的訊號,由以上結果可指出我們提出的系統是有效率且即時的。
In high-tech age, people pressure increase and the depression often appears because people do not understand their pressure situation. At last, the pressure out of control causes the regret. Laughter is an indicator which can help people to know their pressure situation. The high pressure often occurs in people who seldom laugh in life. The laughter detection system can record laughter length and times to help people understand themselves. In the speech recognition, the laughter detection system can be the pre-processing to improve the performance because the recognition rate will be decreased in the situation of the speech signal with laughter.
The previous laughter classification works focused on audio features extraction and building models. The most researches used Mel-scale frequency cepstral coefficients (MFCCs) and Perceptual Linear Predictive Coefficients (PLPs) as audio features, and hidden Markov models (HMMs) and support vector machines (SVMs) were popularly used as classifier. Generally, to achieve high recognition rates, a large database with a well training process is often required. But the variance of acoustic features between two different kinds of laughter is still a serious problem. We proposed a laughter detection system based on the correlation characteristic of signals. The advantages of the system are speaker independent, low-computational and training-free. To achieve the goal, a modified autocorrelation function (MACF) is combined with a new approach called vocal tract transfer detector (VTTD) for segmenting an input signal into a syllable stream. Next, based on each syllable’s Mel-scale frequency cepstral coefficients (MFCCs), the correlation between two consecutive syllables is measured by the dynamic time warping (DTW) algorithm and pitch analysis. The three consecutive syllables with high correlation are considered as a laughter segment.
In our experimental result, the proposed system can achieve an accuracy rate of 94% in ten subjects with totally 150 sentences. In computation time, we choose the smart phone, HTC one X as our experimental platform. The system can handle signal of average 2.63 seconds in one second on smart phone. Such results indicate that the proposed method is effective in detecting laughter, thereby demonstrating the real-time of the system.

摘要 I
ABSTRACT II
致謝 IV
INDEX OF FIGURES VII
INDEX OF TABLES VIII
CHAPTER 1 INTRODUCTION 1
1.1 BACKGROUND AND MOTIVATION 1
1.2 OBJECTIVE OF THESIS 2
1.3 ORGANIZATION OF THESIS 2
CHAPTER 2 RELATED WORKS AND PROPOSED SYSTEM OVERVIEW 3
2.1 FEATURE EXTRACTION FOR LAUGHTER 4
2.1.1 Mel-scale Frequency Cepstral Coefficients 4
2.1.2 Perceptual Linear Predictive Coefficients 5
2.2 CLASSIFIER AND STATISTICAL MODEL 6
2.2.1 Hidden Markov Model 6
2.2.2 Support Vector Machine 7
2.3 CHALLENGE AND THE PROPOSED METHOD 9
2.4 SYSTEM OVERVIEW 11
CHAPTER 3 SYLLABLE SEGMENTATION AND CORRELATION 12
3.1 SYLLABLE SEGMENTATION 12
3.1.1 Modified Autocorrelation Function 13
3.1.2 Voice Activity Detection 17
3.1.3 Vocal Tract Transfer Detector 19
3.2 SYLLABLE CORRELATION 24
3.2.1 Mel-scale Frequency Cepstral Coefficients 26
3.2.2 Dynamic Timing Warping 28
3.2.3 Pitch Analysis 30
3.2.4 Short Pause Analysis 32
CHAPTER 4 ENVIRONMENT SETTING AND TEST PLATFORM 33
4.1 INTRODUCTION TO HW AND SW PLATFORM 33
4.2 ANDROID SYSTEM ARCHITECTURE 34
4.3 ENVIRONMENT SETTING 35
4.3.1 Smart Phone Environment Setting 35
4.3.2 Experimental Testing Corpora 35
CHAPTER 5 EXPERIMENTAL RESULTS 37
5.1 EXPERIMENTAL RESULT OF LAUGHTER CLASSIFICATION 37
5.2 EXPERIMENTAL RESULT ON SMART PHONE 40
CHAPTER 6 CONCLUSIONS AND FUTURE WORKS 42
6.1 CONCLUSIONS 42
6.2 FUTURE WORKS 42
REFERENCES 43
作者簡歷 (AUTHOR’S BIOGRAPHICAL NOTES) 45

[1]M. Schroeder, D. Heylen, and I. Poggi, “Perception of non-verbal emotional listener feedback, in Proc. 3rd Int. Conf. Speech Prosody, Dresden, Germany, 2006, May 2-5, pp. 1–4.
[2]“self improvement mentor (SIM) at http://www.self-improvement-mentor.com
[3]K. Laskowski and S. Burger, “Analysis of the occurrence of laughter in meetings, in Proc. 8th Int. Conf. INTERSPEECH, Antwerp, Belgium, 2007, Aug. 27-31, pp. 1258–1261.
[4]B. Schueller, F. Eyben, and G. Rigoll, “Static and dynamic modelling for the recognition of non-verbal vocalisations in conversational speech, Lecture Notes in Computer Science, vol. 5078, pp. 99–110, 2008.
[5]N. Campbell, H. Kashioka, and R. Ohara, “No laughing matter, in Proc. 9th Eur. Conf. Speech Communication and Technology, Lisbon, Portugal, 2005, Sept. 4-8, pp.465–468.
[6]A. Lockerd and F. Mueller, “LAFCAM: Leveraging affective feedback camcorder, in Proc. CHI, Human Factors in Computing Systems, Minneapolis, Minnesota, United States, 2002, Apr. 20-25 pp. 574–575.
[7]K. Laskowski and T. Schultz, “Detection of laughter-in-Interaction in multichannel close-talk microphone recordings of meetings, Lecture Notes in Computer Science, vol. 5237, pp. 149–160, 2008.
[8]K. P. Truong and D. A. van Leeuwen, “Automatic discrimination between laughter and speech, Speech Commun., vol. 49, no. 2, pp.144–158, 2007.
[9]L. Kennedy and D. Ellis, “Laughter detection in meetings, in Proc. NIST Meeting Recognition Workshop, 2004, May 17.
[10]S. Petridis and M. Pantic, “Audiovisual discrimination between,laughter and speech, in Proc. 34th IEEE Int. Conf. Acoustics, Speech ,Signal Processing, Las Vegas, Nevada, United States, 2008, Mar. 30 – Apr. 4, pp. 5117–5120.
[11]S. Petridis and M. Pantic, “Audiovisual laughter detection based on temporal features, in Proc. 10th ACM Int. Conf. Multimodal Interfaces, Chania, Greece, 2008, Oct. 20-22, pp. 37–44.
[12]M. Knox, N. Morgan, and N. Mirghafori, “Getting the last laugh: Automatic laughter segmentation in meetings, in Proc. 9th Int. Conf. INTERSPEECH, Brisbane, Australia, 2008, Sept. 22-26, pp. 797–800.
[13]S. Petridis and M. Pantic, “Audiovisual discrimination between speech and laughter: Why and when visual information might help, IEEE Trans. Multimedia, vol. 13, no.2, pp. 216-234, Apr. 2011.
[14]N. Campbell, “Whom we laugh with affects how we laugh, in Proc. Workshop Phonetics of Laughter, Saarbrücken, Germany, 2007, Aug. 4-5, pp. 61–65.
[15]R. Muralishankar and D. O'Shaughnessy, “A comparative analysis of noise robust speech features extracted from all-pass based warping with MFCC in a noisy phoneme recognition, in Proc. 3rd IEEE Int. Conf. Digital Telecommunications, Bucharest, Romania, 2008, Jun. 29 – Jul. 5, pp. 180–185.
[16]M. J. Macchi, M. F. Spiegel, and K. L. Wallace, “Modeling duration adjustment with dynamic time warping, in Proc. IEEE Int. Conf. Acoustics, Speech ,Signal Processing, Albuquerque, United States, 1990, Apr. 3-6, vol.1 ,pp. 333–336.
[17]R. Janakiraman, J. C. Kumar, and H. A. Murthy, “Robust syllable segmentation and its application to syllable-centric continuous speech recognition, in Proc. 16th IEEE National Conf. Communications, Madras, Chennai, 2010, Jan. 29-31, pp. 1–5.
[18]E. Vidal, H.M. Rulot, F. Casacuberta, and J.-M. Benedi, “On the use of a metric-space search algorithm (AESA) for fast DTW-based recognition of isolated words, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 36, no.5, pp. 651-660, 1988.
[19]Y. Li, J. Le, Y. Yang, and Jian. Wang, Improvement algorithm of DTW on isolated-word recognition, in Proc. IEEE Int. Conf. Computer Science and Automation Engineering, Shanghai, China, 2011, Jun. 10-12, vol.3 ,pp. 319–322.
[20]M.G. Sumithra, M.S. Ramya, and K. Thanuskodi, “Noise robust isolated word recognition, in Proc. IEEE Int. Conf. Communication and Computational Interlligence, Erode, India, 2010, Dec. 27-29, pp. 362–367.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top