(3.238.130.97) 您好!臺灣時間:2021/05/14 00:24
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:孔祥勳
研究生(外文):Shiang-Shiun Kung
論文名稱:演唱歌詞之正確性評估方法研究
論文名稱(外文):A Study of Singing Evaluation Methods for the Sung Lyrics
指導教授:蔡偉和蔡偉和引用關係
指導教授(外文):Wei-Ho Tsai
口試委員:江振宇王家慶黃士嘉
口試委員(外文):Chen-Yu ChiangJia-Ching WangShih-Chia Huang
口試日期:2016-07-18
學位類別:碩士
校院名稱:國立臺北科技大學
系所名稱:電子工程系碩士班(碩士在職專班)
學門:工程學門
學類:電資工程學類
論文種類:學術論文
畢業學年度:104
語文別:中文
中文關鍵詞:持續期模型母音裁剪母音壓縮唱詞驗證歌唱評分
外文關鍵詞:Duration ModelVowel DecimationVowel ShrinkingSung Lyrics VerificationSinging Evaluation
相關次數:
  • 被引用被引用:1
  • 點閱點閱:115
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究目的是發展一種唱詞驗證系統,自動判斷演唱者是否唱錯歌詞,並進一步挑出哪裡唱錯。唱詞驗證相似於語句驗證問題,直覺上可以利用語音辨認上所使用的方法來處理。然而,由於歌唱聲音訊號可以視為語音訊號的伸縮、變形過後版本,我們發現直接利用語句驗證進行唱詞驗證的效果並不如預期。為此,我們分別從聲音訊號處理與聲學模型處理兩方面進行改善嘗試。在聲音訊號處理方面,鑒於歌唱時常因母音被拉長若干倍而造成與說話時的訊號相差甚多,我們試圖找出歌唱中的母音位置,並對其長度壓縮或裁剪,使其接近語音訊號,以使語句驗證方法較能正常運作。而在聲學模型處理方面,我們透過持續期模型來提升聲學模型對於音長變化的容忍度,進而改善判斷能力。此外,我們藉由觀察唱詞驗證後的音框分布來標示唱錯歌詞位置。實驗結果顯示,透過母音長度壓縮或裁剪、以及導入持續期模型的唱詞驗證分別可達平均72%與90%的正確率,明顯較直接使用語句驗證之63%為佳,另外標示唱錯歌詞位置的正確率可達72%。
This study proposes a sung lyrics verification system for detecting if the lyrics sung by a performer are incorrect and further pointing out the potential mistake that the performer made. In essence, sung lyrics verification is similar to the problem of speech utterance verification in the speech recognition research community, and therefore the techniques in the letter can be applied to the former. However, our preliminary experiment found that a speech utterance verification system cannot handle singing data well, mainly because of the significant differences between singing and speech. To tackle this problem, we develop two strategies, respectively, from a signal processing perspective and from a model processing perspective. In the signal processing, recognizing that the vowels are often lengthened during singing, we propose vowel shrinking and vowel decimation to adjust the length of a vowel in singing to a normal length in speaking. In the model processing, we include a duration model concept in the acoustic modeling to reduce the differences between singing and speech. Our experiments show that the proposed methods can improve the performance of the sung lyrics verification to 72% and 90% accuracy using vowel shrinking, vowel decimation, and duration model approach, respectively, compared to 63% accuracy obtained with the baseline speech utterance verification system.
摘 要 I
ABSTRACT II
誌 謝 IV
目 錄 V
表目錄 VII
圖目錄 VIII
第一章 緒論 1
第二章 文獻探討 2
第三章 語句確認系統 4
3.1 語音辨認上的語句確認系統 4
3.2 將語音辨認的語句確認系統應用於唱詞驗證 10
第四章 基於處理歌唱聲音訊號改善語句確認系統 11
4.1 從聲音訊號進行改善 11
4.1.1 母音壓縮 13
4.1.2 母音裁剪 15
4.2 從聲學模型進行改善 19
4.2.1 加入持續期模型(Duration Model) 19
第五章 標示唱錯歌詞位置 21
5.1 標示原理 21
5.2 加入持續期模型後的標示效果 23
第六章 實驗與評估 24
6.1 資料庫 24
6.1.1 模擬情況說明 25
6.2 驗證實驗結果 27
6.2.1 未經改善的驗證結果 27
6.2.2 母音壓縮的驗證結果 28
6.2.3 母音裁剪方法一的驗證結果 31
6.2.4 母音裁剪方法二的驗證結果 34
6.2.5 加入持續期模型的驗證結果 37
6.3 標示實驗結果 42
第七章 結論與未來展望 44
參考文獻 45
[1]W. H. Tsai and H. C. Lee, “Automatic evaluation of karaoke singing based on pitch, volume, and rhythm features,” IEEE Trans. on Audio, Speech and Lang. Processing, vol. 20, no. 4, 2012, pp. 1233-1243.
[2]W. H. Tsai and C. H. Ma, “Automatic speech and singing discrimination for audio data indexing,” The 4th IEEE International Congress on Big Data, Taipei Satellite Session, 2014, pp. 276-280.
[3]The Hidden Markov Model Toolkit (HTK) - http://htk.eng.cam.ac.uk/
[4]The Association for Computational Linguistics and Chinese Language Processing (ACLCLP) - http://www.aclclp.org.tw/use_mat_c.php
[5]S. B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. on Acoustic, Speech and Signal Processing, vol. 28, no. 4, 1980, pp. 357-366.
[6]L. R. Rabiner. “A tutorial on Hidden Markov Models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, 1989, pp. 257-286.
[7]L. E. Baum, T. Petrie, G. Soules and N. Weiss, “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,” Ann. Math. Statist., vol. 41, no. 1, 1970, pp. 164-171.
[8]Wikipedia web site - https://en.wikipedia.org/wiki/Vowel
[9]P. Ladefoged, Vowels and Consonants: An Introduction to the Sounds of Languages, Blackwell ISBN 0-631-21412-7, 2000.
[10]Wikipedia web site - https://en.wikipedia.org/wiki/Consonant
[11]S. A. Zahorian and H. Hu, “A spectral/temporal method for robust fundamental frequency tracking,” The Journal of the Acoustical Society of America, vol. 123, no. 6, 2008, pp. 4559-4571.
[12]J. L. Flanagan and R. M. Golden, “Phase Vocoder,” Bell System Technical Journal, vol. 45, no. 9, 1966, pp. 1493-1509.
[13]M. Dolson, “The phase vocoder: A tutorial,” Computer Music Journal, vol. 10, no. 4, 1986, pp. 14-27.
[14]J. Pylkkönen and M. Kurimo, "Duration modeling techniques for continuous speech recognition," in Proc. ICASSP, 2004, pp. 385-388.
[15]M. J. Russell and A. E. Cook, "Experimental evaluation of duration modeling techniques for automatic speech recognition," in Proc. ICASSP, 1987, pp. 2376-2379.
[16]A. Martin, G. Doddington, T. Kamm, M. Ordowski and M. Przybocki, “The DET curve in assessment of detection task performance,” in Proceedings of Eurospeech, Greece, 1997, pp. 1895-1898.
[17]黎欣捷,卡拉OK歌唱評分方法研究,碩士論文,國立國立臺北科技大學電腦與通訊研究所,臺北,2010。
[18]許銘凱,自動判斷演唱歌詞正確與否之方法研究,碩士論文,國立國立臺北科技大學電腦與通訊研究所,臺北,2013。
電子全文 電子全文(網際網路公開日期:20210801)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔