跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.81) 您好!臺灣時間:2025/01/15 02:43
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:鮑多福
研究生(外文):Duo-Fu Bao
論文名稱:督導式與非督導式曲風分類
論文名稱(外文):Supervised and Unsupervised Music Genre Classification
指導教授:蔡偉和蔡偉和引用關係
口試委員:鄭士康廖元甫
口試日期:2008-06-06
學位類別:碩士
校院名稱:國立臺北科技大學
系所名稱:電腦與通訊研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:55
中文關鍵詞:曲風非督導式分類高斯混合模型主成分分析階層式聚合分群
外文關鍵詞:Music genreUnsupervised ClassificationGaussian mixture modelPrincipal components analysisHierarchical Agglomerative Clustering
相關次數:
  • 被引用被引用:0
  • 點閱點閱:198
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
由於網路傳輸的日益便利,使得個人的音樂資料庫與日遽增,如何幫助使用者從龐大的音樂資料庫中找出想聽的音樂類型或曲風已成為一項值得深入探討的研究問題。現行的自動音樂曲風辨識方法多採用督導式的運作模式,其必須先有足夠的歌曲樣本及已知樣本所屬曲風來建立各曲風的代表模型。然而,由於標示已知歌曲曲風常需耗費大量人力,且歌曲所屬曲風往往模稜兩可,不易正確標示,因此督導式的曲風分辨模式並不能完全適用於個人音樂資料庫的管理。針對此一問題,本論文提出一種非督導式的曲風分辨法,利用相似性量測方式將歌曲進行分群,使得同群內的歌曲具有相同或相近的曲風,進而達到音樂資料庫的自動索引。其中,我們發展了若干歌曲相似性量測方法,先將每個音樂檔案表示為高斯混合模型,再計算音檔相互比對的似然率,並利用似然率導出「交互似然率比」、「歐基理德距離」、或「餘弦角距離」等歌曲相似性值。我們也應用主成份分析法來提升歌曲相似性計算。接著以階層式匯聚分群法產生群集,並估算各種不同群數之分群結果的芮氏指標。最後,我們利用芮氏指標之最小值出現於群數等於實際曲風種類數的性質,發展了一套自動決定最佳群數的方法。實驗結果證實非督導式曲風分辨法之可行性。
Explosive growth in the Internet and digital media has motivated recent research into developing techniques for helping users locate their desired music styles or genres from numerous options. Existing systems for automatic genre classification follows a supervised framework that extracts genre-specific information from manually-labeled music data. However, such systems may not be suitable for personal music management, because manually labeling music by genre can be labor intensive and subject to the discrepancy between individuals. In this paper, we study an unsupervised paradigm for music genre classification. It is aimed to partition a collection of unknown music recordings into several clusters such that each cluster contains recordings of only one genre, and different clusters represent different genres. This enables users to organize their personal music database without needing specific knowledge about genre. To attain such a partitioning, we develop several methods for measuring the similarities between music recordings. They all start by representing each music recording as a Gaussian mixture model (GMM), and computing the likelihood that every recording tests for every GMM. Then, three inter-recording similarity measurements based on likelihoods are derived, namely, cross likelihood ratio, Euclidean distance, and cosine distance. We further propose using principal component analysis to enhance the similarity measurement. By applying the hierarchical agglomerative clustering, music recordings are partitioned as a tree of clusters. The Rand index are then estimated for each branch of the cluster tree. Motivated by the fact that the minimal value of the Rand index only appears when the number of clusters equals the true number of genres, we propose determining the optimal number of clusters by searching for the branch of the cluster tree that produces the minimal value of the Rand index. Our experiment results show the feasibility of clustering music recordings by genre.
摘 要 i
ABSTRACT ii
誌 謝 iv
目 錄 v
表目錄 vii
圖目錄 viii
第一章 緒論 1
1.1研究動機與目的 1
1.2研究方法 2
1.3章節介紹 4
第二章 相關知識 5
2.1前言 5
2.2相關研究 5
2.3督導式曲風分類 8
2.4非督導式曲風分類 9
第三章 督導式音樂曲風分類 11
3.1系統架構 11
3.2音色結構特徵 12
3.2.1 預處理(Pre-Process) 13
3.2.2 離散傅利業轉換(Discrete Fourier Transform, DFT) 14
3.2.3 三角帶通濾波器(Triangular Filter Bank) 14
3.2.4梅爾倒頻譜係數(Mel Frequency Cepstral Coefficients) 16
3.2.5頻譜質量中心(Spectrum Centroid) 17
3.2.6 雷尼熵(Renyi Entropy) 17
3.2.7頻譜變遷(Spectrum Flux) 17
3.2.8頻譜滾邊(Spectrum Rolloff) 17
3.2.9 差分係數(Delta coefficient) 18
3.3高斯混合模型(Gaussian Mixture Model) 18
3.4調適性高斯混合模型(Adaptive Gaussian Mixture Model) 20
3.5督導式訓練實驗結果 20
3.5.1前言 20
3.5.2實驗資料來源 23
3.5.3 督導式曲風分類結果 24
3.5.4調適性高斯混合模型曲風分類比較 28
第四章 非督導式音樂曲風相似性量測與分群 29
4.1系統架構 29
4.2 分群效能評估方式 30
4.3音樂曲風相似性量測 32
4.4音樂曲風距離量測 34
4.5主成分分析(Principal Components Analysis, PCA) 37
4.6依曲風相似性分群音樂片段 39
4.7自動決定群數 41
4.8非督導式分群實驗結果 44
4.8.1實驗資料 44
4.8.2 HAC中計算相似性關係的比較 44
4.8.3 PCA對於分群結果的比較 46
4.8.4分群實驗結果 48
4.8.5自動分群結果 51
第五章 結論與未來展望 52
參考文獻 53
[1]D. Pye, “Content-Based Methods for the Management of Digital Music,” in Proc. IEEE Conf. Acoustics, Speech, Signal Processing (ICASSP, pp. 2437–2440, 2000.
[2]Shih-Chuan Chiu and Man-Kwan Shan, “Computer Music Composition Based on Discovered Music Patterns,” in Proc. IEEE Conference on Systems, Man, Cybernetics, Taipei, Taiwan, 2006.
[3]Tao Li and Mitsunori Ogihara, “Toward Intelligent Music Information Retrieval,” IEEE Transactions on Multimedia, vol. 8, no. 3, pp. 564–574, June 2006.
[4]C. McKay, Automatic genre classification of MIDI recordings, Master’s thesis, McGill University, Canada, 2004.
[5]M.F. McKinney, and J. Breebaart. “Features for audio and music classification”, in Proc. International Symposium on Music Information Retrieval. 2003.
[6]Bozena Kostek, “Musical instrument classification and duet analysis employing music information retrieval techniques,” in Proc. of the IEEE, vol. 92, issue 4, pp. 721–729, Apr 2004.
[7]J. Saunders, “Real time discrimination of broadcast speech/music,” in Proc. IEEE Conf. Acoustics, Speech, Signal Processing (ICASSP), pp. 993–996, 1996.
[8]E. Scheirer and M. Slaney, “Construction and evaluation of a robust multifeature speech/music discriminator,” in Proc. IEEE Conf. Acoustics, Speech, Signal Processing (ICASSP), pp. 1331–1334, 1997.
[9]D. Kimber and L.Wilcox, “Acoustic segmentation for audio browsers,” in Proc. Interface Conf., Sydney, Australia, July 1996.
[10]A. L. Berenzweig and D. P. Ellis, “Locating singing voice segments within musical signals,” in Proc. Int. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Mohonk, NY, pp. 119–123, 2001.
[11]G. Tzanetakis and P. Cook, “Musical Genre Classification of Audio Signals,” IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, July 2002.
[12]C. Xu and N. C. Maddage, “Automatic Music Classification and Summarization,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 3, May 2005.
[13]E. Wold, T. Blum, D. Keislar, and J. Wheaton, “Content-based classification, search, and retrieval of audio,” IEEE Multimedia, vol. 3, no. 2, 1996.
[14]J. Foote, “Content-based retrieval of music and audio,” Multimed. Storage Archiv. Syst. , pp. 138–147, 1997.
[15]G. Li and A. Khokar, “Content-based indexing and retrieval of audio data using wavelets,” in Proc. IEEE Conf. Multimedia Expo , pp. 885–888, 2000.
[16]S. Li, “Content-based classification and retrieval of audio using the nearest feature line method,” IEEE Trans. Speech Audio Processing, vol. 8, pp. 619–625, Sept. 2000.
[17]X. Shao, C. Xu. M. Kankanhalli, “Unsupervised classification of musical genre using hidden Markov model,” in IEEE Int. Conf. of Multimedia Expo (ICME), pp. 2023–2026, 2004.
[18]F. Mörchen, A. Ultsch, M. Nöcker, and C. Stamm, “Databionic visualization of music collections according to perceptual distance,” In Proc. International Symposium on Music Information Retrieval, pp. 396–403, 2005.
[19]王小川,語音訊號處理,台北:全華科技圖書有限公司,2004。
[20]http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/
[21]A. Ramalingam and S. Krishnan, “Gaussian Mixture Modeling of Short-Time Fourier Transform Features for Audio Fingerprinting,” IEEE Transactions on Information Forensics and Security, 2006.
[22]A. Dempster, N. Laird, and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stati. Soc., vol. 39, pp. 1–38, 1977.
[23]L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, 1993.
[24]Training part 1 and Training part 2, “Audio Genre Classification, MIREX 2005”, http://www.musicir.org/mirex/2005/index.php/Audio_Genre_Classification, 2005.
[25]A. Solomonoff, A. Mielke, M. Schmidt, and H. Gish, “Clustering speakers by their voices,” in Proc. IEEE Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 757–760, 1998.
[26]L. Hubert and P. Arabie, “Comparing partitions,” J. Classification, vol. 2, pp. 193–218, 1985.
[27]D. A. Reynolds, E. Singer, B. A. Carson, G. C. O’Leary, J. J. McLaughlin, and M. A. Zissman, “Blind clustering of speech utterances based on speaker and language characteristics,” in Proc. Int. Conf. Spoken Lang. Process. (ICSLP), pp. 3193–3196, 1998.
[28]M. Turk and A. Pentland, “Eigenfaces for Recognition,” Journal of Cognitive Neuroscience, vol. 3, pp. 71–86, 1991.
[29]W. H. Tsai and H. M. Wang, “Speech utterance clustering based on the maximization of within-cluster homogeneity of speaker voice characteristics,” J. Acoust. Soc. Amer., vol. 120, no. 3, pp. 1631–1645, 2006.
[30]Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top