( 您好!臺灣時間:2021/05/14 00:24
字體大小: 字級放大   字級縮小   預設字形  


研究生(外文):Shiang-Shiun Kung
論文名稱(外文):A Study of Singing Evaluation Methods for the Sung Lyrics
指導教授(外文):Wei-Ho Tsai
口試委員(外文):Chen-Yu ChiangJia-Ching WangShih-Chia Huang
外文關鍵詞:Duration ModelVowel DecimationVowel ShrinkingSung Lyrics VerificationSinging Evaluation
  • 被引用被引用:1
  • 點閱點閱:115
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
This study proposes a sung lyrics verification system for detecting if the lyrics sung by a performer are incorrect and further pointing out the potential mistake that the performer made. In essence, sung lyrics verification is similar to the problem of speech utterance verification in the speech recognition research community, and therefore the techniques in the letter can be applied to the former. However, our preliminary experiment found that a speech utterance verification system cannot handle singing data well, mainly because of the significant differences between singing and speech. To tackle this problem, we develop two strategies, respectively, from a signal processing perspective and from a model processing perspective. In the signal processing, recognizing that the vowels are often lengthened during singing, we propose vowel shrinking and vowel decimation to adjust the length of a vowel in singing to a normal length in speaking. In the model processing, we include a duration model concept in the acoustic modeling to reduce the differences between singing and speech. Our experiments show that the proposed methods can improve the performance of the sung lyrics verification to 72% and 90% accuracy using vowel shrinking, vowel decimation, and duration model approach, respectively, compared to 63% accuracy obtained with the baseline speech utterance verification system.
摘 要 I
誌 謝 IV
目 錄 V
表目錄 VII
圖目錄 VIII
第一章 緒論 1
第二章 文獻探討 2
第三章 語句確認系統 4
3.1 語音辨認上的語句確認系統 4
3.2 將語音辨認的語句確認系統應用於唱詞驗證 10
第四章 基於處理歌唱聲音訊號改善語句確認系統 11
4.1 從聲音訊號進行改善 11
4.1.1 母音壓縮 13
4.1.2 母音裁剪 15
4.2 從聲學模型進行改善 19
4.2.1 加入持續期模型(Duration Model) 19
第五章 標示唱錯歌詞位置 21
5.1 標示原理 21
5.2 加入持續期模型後的標示效果 23
第六章 實驗與評估 24
6.1 資料庫 24
6.1.1 模擬情況說明 25
6.2 驗證實驗結果 27
6.2.1 未經改善的驗證結果 27
6.2.2 母音壓縮的驗證結果 28
6.2.3 母音裁剪方法一的驗證結果 31
6.2.4 母音裁剪方法二的驗證結果 34
6.2.5 加入持續期模型的驗證結果 37
6.3 標示實驗結果 42
第七章 結論與未來展望 44
參考文獻 45
[1]W. H. Tsai and H. C. Lee, “Automatic evaluation of karaoke singing based on pitch, volume, and rhythm features,” IEEE Trans. on Audio, Speech and Lang. Processing, vol. 20, no. 4, 2012, pp. 1233-1243.
[2]W. H. Tsai and C. H. Ma, “Automatic speech and singing discrimination for audio data indexing,” The 4th IEEE International Congress on Big Data, Taipei Satellite Session, 2014, pp. 276-280.
[3]The Hidden Markov Model Toolkit (HTK) - http://htk.eng.cam.ac.uk/
[4]The Association for Computational Linguistics and Chinese Language Processing (ACLCLP) - http://www.aclclp.org.tw/use_mat_c.php
[5]S. B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. on Acoustic, Speech and Signal Processing, vol. 28, no. 4, 1980, pp. 357-366.
[6]L. R. Rabiner. “A tutorial on Hidden Markov Models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, 1989, pp. 257-286.
[7]L. E. Baum, T. Petrie, G. Soules and N. Weiss, “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,” Ann. Math. Statist., vol. 41, no. 1, 1970, pp. 164-171.
[8]Wikipedia web site - https://en.wikipedia.org/wiki/Vowel
[9]P. Ladefoged, Vowels and Consonants: An Introduction to the Sounds of Languages, Blackwell ISBN 0-631-21412-7, 2000.
[10]Wikipedia web site - https://en.wikipedia.org/wiki/Consonant
[11]S. A. Zahorian and H. Hu, “A spectral/temporal method for robust fundamental frequency tracking,” The Journal of the Acoustical Society of America, vol. 123, no. 6, 2008, pp. 4559-4571.
[12]J. L. Flanagan and R. M. Golden, “Phase Vocoder,” Bell System Technical Journal, vol. 45, no. 9, 1966, pp. 1493-1509.
[13]M. Dolson, “The phase vocoder: A tutorial,” Computer Music Journal, vol. 10, no. 4, 1986, pp. 14-27.
[14]J. Pylkkönen and M. Kurimo, "Duration modeling techniques for continuous speech recognition," in Proc. ICASSP, 2004, pp. 385-388.
[15]M. J. Russell and A. E. Cook, "Experimental evaluation of duration modeling techniques for automatic speech recognition," in Proc. ICASSP, 1987, pp. 2376-2379.
[16]A. Martin, G. Doddington, T. Kamm, M. Ordowski and M. Przybocki, “The DET curve in assessment of detection task performance,” in Proceedings of Eurospeech, Greece, 1997, pp. 1895-1898.
電子全文 電子全文(網際網路公開日期:20210801)
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔