(100.26.179.251) 您好!臺灣時間:2021/04/21 21:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:林巧薇
研究生(外文):Chiao-Wei Lin
論文名稱:應用節奏與頻率資訊之改良式哼唱檢索系統及改良式發端偵測與旋律匹配
論文名稱(外文):Improved Query by Humming System Using the Tempo and Frequency Information and Advanced Onset Detection and Melody Matching Methods
指導教授:丁建均丁建均引用關係
指導教授(外文):Jian-Jiun Ding
口試委員:郭景明葉敏宏簡鳳村
口試委員(外文):Jing-Ming GuoMin-Hung YehFeng-Tsun Chien
口試日期:2016-06-28
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電信工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:英文
論文頁數:94
中文關鍵詞:哼唱檢索發端辨識音頻偵測旋律比對隱藏式馬可夫模型
外文關鍵詞:Query by hummingbeatonset detectionpitch estimationmelody matchinghidden Markov model
相關次數:
  • 被引用被引用:0
  • 點閱點閱:69
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來語音系統運用十分廣泛,透過從聲音訊號中分析、擷取特徵,進而到合成壓縮等。應用在哼唱檢索系統,可以不必透過傳統的歌名、歌手或歌詞等已知文字資訊,而改以歌曲的旋律節奏資訊搜尋。在現今音樂資料庫龐大的情況下,此搜尋系統應在短時間內結束搜尋,因此需要兼顧搜尋效率及效果。
一個哼唱檢索系統主要包含三部份:輸入端、資料庫與系統演算法。輸入端是系統所接收的輸入,在此系統中通常是單純的歌聲;資料庫建立可比對的歌曲資料,提供可能的檢索曲目;系統演算法需先擷取輸入端特徵,並將之轉換為可比對的格式,如音高及音符長度,再運用比對演算法比對輸入序列與資料庫。
哼唱檢索系統中利用比對演算法逐一比對輸入信號與資料庫中的歌曲,並依照相似分數排序給出可能的歌曲清單。然而大部分的人無法完全依照原曲唱出正確的旋律,且每個人的唱腔也有所不同,因此一個好的系統應能盡所能的應付這些可能發生的狀況。
一般而言,比對方式大致分為二種:音符導向或音框導向,前者不論資料庫及輸入訊號都以音符為單位,其優勢是效率較高;後者以音框為單位,在比對上一般而言較為有效。在這篇論文中,我們使用音符導向的比對系統,提出一個發端辨識的改進方法,也改進旋律比對方法來改善哼唱比對系統。此外,我們也使用自己提出的方法做音高偵測。
在未來研究的方向上,希望能更進一步改善哼唱檢索的效率與效能,特別是在現實世界中,資料庫中歌曲的數量事相當龐大的,因此更要求運算速度。在發端辨識上,雖然我們的方法已能達到不錯的效果,但仍有改進空間,希望未來研究能達到更好的準確率。

In recent years, voice signal system has been used in a wide range. The techniques include voice signal analysis, feature extraction, voice synthesis and compression. Applying these techniques to the query by humming (QBH) system, we do not need to know the traditional concrete descriptions like song name, singer or lyrics. Instead, we can use the melody and tempo information to search a song. The music database nowadays is large and the search result should be revealed in a short time, so the system efficiency is as important as the effect.
A QBH system includes three parts: input data, music database and matching algorithm. Input data of the QBH system is usually the humming or singing from human. The database is a collection of songs provided for search. The system first extract the features of input signal like pitch and length of notes, then use the matching algorithm to compare with the songs in database.
The QBH system compares the input signal and database to list a ranking of possible songs by the matching score. However, people usually cannot sing perfectly just as the reference song. Also the singing style is different from person to person. A good QBH system should be able to deal with all possible problems for amateur singing.
Generally speaking, the melody matching at least two types: the note-based and the frame-based method. The advantage of note-based system is its efficiency while the frame-based system is more effective. In this paper, we use the note-based method. We proposed an advanced onset detection and improved melody matching system to improve the QBH system. Besides, we use our own pitch estimation method to estimate the fundamental frequency.
In our experiment, we show the fact that our proposed onset detection has the best performance than other methods. However, the future work should make an effort to improve the system efficiency and effect further since the database in the real world is huge.


中文摘要 i
ABSTRACT ii
CONTENTS iv
LIST OF FIGURES vii
LIST OF TABLES x
Chapter 1 Introduction 1
1.1 Background 1
1.2 Musical Characteristics and Terms 3
1.3 Chapter Organization 8
Chapter 2 System Structure of QBH System 11
2.1 Note-based Approaches 12
2.1.1 Onset Detection 13
2.1.2 Difference of Magnitude Method 14
2.1.3 Short-term Energy Method 15
2.1.4 Surf Method 18
2.1.5 Envelope Match Filter 19
2.2 Frame-based Approaches 21
2.3 Pitch Estimation 23
2.3.1 Autocorrelation Function 24
2.3.2 Average Magnitude Difference Function 24
2.3.3 Harmonic Product Spectrum 25
Chapter 3 Melody Matching 27
3.1 Dynamic Programming [28] 27
3.2 Hidden Markov Model [29] 29
3.3 Linear Scaling [25] 33
3.4 Note-based Linear Scaling and Recursive Align [26] 35
3.5 FFT-based Method [27] 37
3.6 Quantized Binary Code-based Linear Scaling [30] 39
Chapter 4 Proposed Algorithm 42
4.1 Proposed Onset Algorithm 43
4.2 Proposed Pitch Estimation 49
4.3 Proposed Melody Matching 53
4.3.1 Proposed Hidden Markov Model 56
4.3.2 Modified Dynamic Programming 62
4.3.3 Music Indexing 69
4.3.4 Progressive Filtering and Score Level Fusion 70
Chapter 5 Experiment Result 73
5.1 Music Dataset 73
5.2 Performance Parameter 74
5.3 Comparison of Onset Detection Performance 76
5.4 Comparison of Melody Matching Performance 81
Chapter 6 Conclusion and Future Work 86
6.1 Conclusion 86
6.2 Future Work 87
REFERENCE 89


A.Query by Humming System
[1]A. Ghias, J. Logan, D. Chamberlin et al., "Query by humming: musical information retrieval in an audio database," Proceedings of the third ACM international conference on Multimedia. ACM, pp. 231-236, 1995.
[2]N. Kosugi, Y. Nishihara, S. Kon et al., "Music retrieval by humming-using similarity retrieval over high dimensional feature vector space." Communications, Computers and Signal Processing, 1999 IEEE Pacific Rim Conference on. IEEE, pp. 404-407, 1999.
[3]S. Pauws, "CubyHum: a fully operational query by humming system," ISMIR, pp. 187-196, 2002.

B.Onset Detection
[4]S. Hainsworth, and M. Macleod, "Onset detection in musical audio signals," Proc. Int. Computer Music Conference, pp. 163-6, 2003.
[5]J. P. Bello, L. Daudet, S. Abdallah et al., “A tutorial on onset detection in music signals,” Speech and Audio Processing, IEEE Transactions on, vol. 13, no. 5, pp. 1035-1047, 2005.
[6]J.-J. Ding, C.-J. Tseng, C.-M. Hu et al., "Improved onset detection algorithm based on fractional power envelope match filter," Signal Processing Conference, 2011 19th European. IEEE, pp. 709-713, 2011.
[7]J. P. Bello, and M. Sandler, "Phase-based note onset detection for music signals,"Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP''03). 2003 IEEE International Conference on. IEEE, vol. 5, pp. V-441-4, 2003.
[8]S. Abdallah, and M. D. Plumbley, "Unsupervised onset detection: a probabilistic approach using ICA and a hidden Markov classifier," Cambridge Music Processing Colloquium, 2003.
[9]A. Klapuri, "Sound onset detection by applying psychoacoustic knowledge," Acoustics, Speech, and Signal Processing, 1999. Proceedings, 1999 IEEE International Conference on. IEEE, vol. 6, pp. 3089-3092, 1999.

C.Pitch Estimation
[10]M. R. Schroeder, “Period Histogram and Product Spectrum: New Methods for Fundamental‐Frequency Measurement,” The Journal of the Acoustical Society of America, vol. 43, no. 4, pp. 829-834, 1968.
[11]L. Rabiner, M. J. Cheng, A. E. Rosenberg et al., “A comparative performance study of several pitch detection algorithms,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 24, no. 5, pp. 399-418, 1976.
[12]D. J. Hermes, “Measurement of pitch by subharmonic summation,” The journal of the acoustical society of America, vol. 83, no. 1, pp. 257-264, 1988.
[13]S. Kadambe, and G. F. Boudreaux-Bartels, “Application of the wavelet transform for pitch detection of speech signals,” IEEE Transactions on Information Theory, vol. 38, no. 2, pp. 917-924, 1992.
[14]E. Pollastri, "Melody-retrieval based on pitch-tracking and string-matching methods," Proc. Colloquium on Musical Informatics, Gorizia. 1998.
[15]E. Tsau, N. Cho, and C.-C. J. Kuo, "Fundamental frequency estimation for music signals with modified Hilbert-Huang transform (HHT)," Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on. IEEE, pp. 338-341, 2009.
[16]M. J. Ross, H. L. Shaffer, A. Cohen et al., “Average magnitude difference function pitch extractor,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 22, no. 5, pp. 353-362, 1974.
[17]N. E. Huang, Z. Shen, S. R. Long et al., "The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis," Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 454, no. 1971, pp. 903-995, 1998.
[18]X.-D. Mei, J. Pan, and S.-h. Sun, "Efficient algorithms for speech pitch estimation," Intelligent Multimedia, Video and Speech Processing, 2001. Proceedings of 2001 International Symposium on. IEEE, pp. 421-424, 2001.
[19]A. M. Noll, “Cepstrum pitch determination,” The journal of the acoustical society of America, vol. 41, no. 2, pp. 293-309, 1967.

D.Melody Matching
[20]J. Shifrin, and W. Birmingham, “Effectiveness of HMM-based retrieval on large databases,” Ann Arbor, vol. 1001, pp. 48109-2110, 2003.
[21]Y. Kim, and C. H. Park, "Query by Humming by Using Scaled Dynamic Time Warping," Signal-Image Technology & Internet-Based Systems (SITIS), 2013 International Conference on, pp. 1-5, IEEE, 2013.
[22]G. P. Nam, and K. R. Park, “Fast query-by-singing/humming system that combines linear scaling and quantized dynamic time warping algorithm,” International Journal of Distributed Sensor Networks, vol. 2015, 2015.
[23]J.-S. R. Jang, and M.-Y. Gao, "A query-by-singing system based on dynamic programming," Proceedings of international workshop on intelligent system resolutions (8th bellman continuum), Hsinchu , pp. 85-89, 2000.
[24]S. Doraisamy, and S. Rüger, “Robust polyphonic music retrieval with n-grams,” Journal of Intelligent Information Systems, vol. 21, no. 1, pp. 53-70, 2003.
[25]J.-S. R. Jang, H.-R. Lee, and M.-Y. Kao, "Content-based music retrieval using linear scaling and branch-and-bound tree search," ICME. Vol. 1, pp. 289-292, 2001.
[26]J. Yang, J. Liu, and W. Zhang, "A fast query by humming system based on notes," INTERSPEECH, pp. 2898-2901, 2010.
[27]W.-H. Tsai, and Y.-M. Tu, "An Efficient Query-by-Singing/Humming System Based on Fast Fourier Transforms of Note Sequences," Multimedia and Expo (ICME), 2012 IEEE International Conference on. IEEE, pp. 521-525, 2012.
[28]R. Bellman, “Dynamic programming and Lagrange multipliers,” Proceedings of the National Academy of Sciences of the United States of America, vol. 42, no. 10, pp. 767, 1956.
[29]L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, 1989.
[30]G. P. Nam, T. T. T. Luong, H. H. Nam et al., “Intelligent query by humming system based on score level fusion of multiple classifiers,” EURASIP Journal on Advances in Signal Processing, vol. 2011, no. 1, pp. 1-11, 2011.

E.Others
[31]Jyh-Shing Roger Jang, "MIR-QBSH Corpus", MIR Lab, CS Dept., Tsing Hua Univ., Taiwan. Available at the "MIR-QBSH Corpus" link at http://www.cs.nthu.edu.tw/~jang.
[32]E.D. Scheirer, “Tempo and Beat Analysis of Musical Signals,” The Journal of the Acoustic Society of America, vol. 103, Issue 1, pp. 588-601, January 1998.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔