跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.81) 您好!臺灣時間:2025/01/15 04:29
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:任佳珉
研究生(外文):Ren, Jia-Min
論文名稱:發掘具鑑別性特徵於音樂曲風/情緒分類之應用
論文名稱(外文):Discovering discriminative features with applications to music genre/mood classification
指導教授:張智星張智星引用關係張俊盛張俊盛引用關係
學位類別:博士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:70
中文關鍵詞:音樂曲風分類音樂情緒分類具時間限制時序性序列調頻頻譜圖
外文關鍵詞:Music genre classificationMusic mood classificationTime-constrained sequential patternsModulation spectrogram
相關次數:
  • 被引用被引用:0
  • 點閱點閱:651
  • 評分評分:
  • 下載下載:80
  • 收藏至我的研究室書目清單書目收藏:1
音樂歌曲通常是由一序列的聲音事件所組成,而這些聲音事件包含時序上短時間及長時間的資訊。然而,在自動音樂曲風分類的研究上,大部份基於文字分類的方法都只抽取時序上短時間內的相依資訊(如基於unigram和bigram出現次數的統計資料)來代表音樂內容。在本論文中,我們提出利用具時間限制的時序樣式(time-constrained sequential patterns, TSPs)作為有鑑別性的特徵,並應用在音樂曲風分類。首先,我們利用自動語言識別技術將音樂歌曲轉換成由隱藏式馬可夫模型索引值所組成的序列。爾後,我們對音樂序列施以TSP探勘技術得到特定曲風的TSPs,緊接著則是計算探勘所得到的TSPs出現在每首歌曲的頻率次數。最後,這些出次頻率次數被用來訓練支援向量機以完成分類目的。在兩個廣泛應用在音樂曲風分類的資料集:GTZAN及ISMIR2004Genre上所完成的實驗,顯示了我們所提出的方法比基於unigram及bigram統計的方法可以發掘更多具鑑別性的時序上結構,並同時達到更好的辨識效果。
此外,我們也提出另一音樂曲風/情緒分類系統,其結合短時間以音框為主的音色特徵及長時間的音色調頻頻譜分析,並以支援向量機為分類器。我們提出的分類系統贏得MIREX 2011音樂情緒分類競賽冠軍。在我們的系統中,我們對短時間音色特徵執行傳統的調頻頻譜分析以取出長時間的調頻特徵。然而,在分析過程中的兩個步驟將導致遺失掉有用的調頻資訊以至於低估其分類效果。第一是針對從紋理視窗(每一視窗都是由從數以百計的音框取出的音色特徵所組成)取出的調頻頻譜圖計算平均值以得到一首音樂具代表性的調頻頻譜圖。第二是從調頻頻譜對比/低峰矩陣(其是由具代表性的調頻頻譜圖計算得到)計算出平均值及變數異以取得每首音樂的壓縮特徵向量。為了避免平滑掉調頻資訊,在此論文中,我們提出利用語音頻率及調頻頻率的兩維表示圖取出結合語音-調頻頻率的特徵值。這些結合語音-調頻頻率的特徵,包括語音-調頻對比係數/波谷係數(AMSC/AMSV)、平坦度及峰度量評值(AMSFM/AMSCM),是從每個結合語音-調頻頻率子頻帶的調頻頻譜計算得到。藉由整合我們所提出的特徵值、MFCC的調頻分析及短時間音色特徵的統計值,這組新的特徵集合在其它4個曲風/情緒資料集上的分類效果已勝過我們MIREX 2011的方法。
A music piece usually consists of a sequence of sound events which represent both short-term and long-term temporal information. However, in the task of automatic music genre classification, most text-categorization-based approaches only capture temporal local dependencies (e.g., unigram and bigram-based occurrence statistics) to represent music contents. In this dissertation, we propose to use time constrained sequential patterns (TSPs) as effective features for music genre classification. First of all, an automatic language identification technique is performed to tokenize each music piece into a sequence of hidden Markov model indices. Then TSP mining is applied to music sequences to discover genre-specific TSPs, followed by the computation of occurrence frequencies of TSPs in each music piece. Finally, these occurrence frequencies are feed into support vector machines (SVMs) to perform the classification task. Experiments conducted on two widely used datasets, GTZAN and ISMIR2004Genre, show that the proposed method can discover more discriminative temporal structures and achieve a better recognition accuracy than the unigram and bigram-based statistical approach.
In addition, we also propose another music genre/mood classification system which combines both short-term frame based timbre features and the long-term modulation spectral analysis of timbre features for SVMs. This proposed system won the first place of the MIREX 2011 music mood classification task. In our submission, we performed the modulation spectral analysis on short-term timbre features to extract long-term modulation features. However, two operations in this analysis are likely to smooth out useful modulation information, which may degrade the classification performance. The first one is to take the averaging of modulation spectrograms extracted from texture windows (each of which is composed of timbre features extracted from hundreds of frames) to create a representative modulation spectrogram for a music clip. The second one is to compute the mean and standard deviation of modulation spectral contrast/valley matrices (these two matrices are computed from the representative modulation spectrogram) to obtain a compact feature vector for a music clip. To avoid smoothing out modulation information, in this dissertation, we propose the use of a two-dimensional representation of acoustic frequency and modulation frequncy to compute joint frequency features. These joint frequency features, including acoustic-modulation spectral contrast/valley (AMSC/AMSV), flatness measure and crest measure (AMSFM/AMSCM), are then computed from modulation spectra of each joint frequency subband. By combining the proposed features, together with the modulation spectral analysis of MFCC, and statistical descriptors of short-term timbre features, this new set of features outperforms our MIREX 2011 submission on four other genre/mood datasets.
中文摘要 I
Abstract III
致謝 V
Contents VI
List of Figures VIII
List of Tables X
CHAPTER 1. Introduction 1
CHAPTER 2. Proposed Time-Constrained Sequential Patterns (TSPs)-Based Music Genre Classification System 3
2.1. Related Work on Text-Categorization-Based Approaches 3
2.2. Acoustic Units Based Music Transcripts 5
2.2.1. Initial Transcript Generation 6
2.2.2. ASM Training 8
2.3. TSP-Based Music Genre Classification 9
2.3.1. Definition of Time-Constrained Sequential Pattern (TSP) 10
2.3.2. Discriminative TSPs Discovery 12
2.3.2.1. Minimum Support 13
2.3.2.2. Maximum Gap 14
2.4. TSPs Occurrence Frequency Computation 15
2.5. Feature Weighting and Classifier Design 16
CHAPTER 3. Experiments of the Proposed TSPs-Based Music Genre Classification System 18
3.1. Datasets 18
3.2. Evaluation Method and Experimental Setups 18
3.3. Classification Results 22
3.4. TSPs Analysis—Lengths and Counts 27
3.5. TSPs Analysis—Repeating Tokens 29
3.6. Comparisons to Other Existing Approaches 30
3.7. Computation Complexity (Computation Time) 32
3.8. Summary 33
CHAPTER 4. Proposed Timbre and Modulation Features-Based Music Genre/Mood Classification System 34
4.1. Related Work 34
4.2. Audio Feature Extraction 36
4.2.1. Frame-Based Audio Features 37
4.2.2. Modulation Spectral Analysis of MFCC, OSC, and SFM/SCM (MMFCC, MOSC, and MSFM/MSCM) 41
4.2.3. Proposed Joint Frequency Features: Acoustic-Modulation Spectral Contrast/Valley (AMSC/AMSV), and Acoustic-Modulation Spectral Flatness/Crest Measure (AMSFM/AMSCM) 43
4.3. MIREX 2011 Method 46
4.4. The Proposed Method: On the Combination of Joint Frequency Features with MMFCC and Statistical Descriptors of Short-Term Timbre Features 47
CHAPTER 5. Experiments of MIREX 2011 Method and the Proposed Timbre and Modulation Features-Based Music Genre/Mood Classification System 48
5.1. Music Genre/Mood Datasets 48
5.2. Evaluation Method and Experimental Setup 50
5.3. MIREX 2011 Results 51
5.4. Extended Experiments 52
5.5. Summary 57
CHAPTER 6. Conclusions and Future Work 59
References 61
[1] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech Audio Process., vol. 10, no. 5, pp. 293-302, 2002.
[2] L. Lu, D. Liu, and H.-J. Zhang, “Automatic mood detection and tracking of music audio signals,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 5-18, Jan. 2006.
[3] D. Huron, “Perceptual and cognitive applications in music information retrieval,” in Proc. Int. Society Music Info. Retrieval, 2000.
[4] S. Lippens, J. P. Martens, M. Leman, B. Baets, H. Meyer, and G. Tzanetakis, “A comparison of human and automatic musical genre classification,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., pp. 233-236, 2004.
[5] L. Karydis, A. Nanopoulos, and Y. Manolopoulos, “Symbolic music genre classification based on repeating patterns,” in Proc. ACM Multimedia, pp. 53-57, 2006.
[6] J. J. Aucouturier and F. Pachet, “Representing musical genre: A state of the art,” J. New Music Res., pp. 83-93, 2003.
[7] T. Li, M. Ogihara, and Q. Li, “A comparative study on content-based music genre classification,” in Proc. ACM SIGIR, pp. 282-289, 2003.
[8] T. Li and M. Ogihara, “Toward intelligent music information retrieval,” IEEE Trans. Multimedia, vol. 8, no. 3, pp. 564-573, 2006.
[9] D.-N. Jiang, L. Lu, H.-J. Zhang, J.-H. Tao, and L.-H. Cai, “Music type classification by spectral contrast feature,’ in Proc IEEE Int. Conf. Multimedia and Expo., vol. 1, pp. 113-116, 2002.
[10] K. West and S. Cox, “Features and classifiers for the automatic classification of musical audio signals,” in Proc. Int. Symp. Music Inf. Retrieval, pp. 531-536, 2004.
[11] M. F. McKinney and J. Breebaart, “Features for audio and music classification,” in Proc. Int. Society Music Inf. Retrieval, pp. 151-158, 2003.
[12] T. Lidy and A. Rauber, “Evaluation of feature extractors and psychoacoustic transformations for music genre classification,” in Proc. Int. Society Music Inf. Retrieval, pp. 34-41, 2005.
[13] J. Bergatra, N. Casagrande, D. Erhan, D. Eck, and B.kegl, “Aggregate features and Adaboost for music classification,” Mach. Learn., vol. 65, no. 2-3, pp. 473-484, Jun. 2006.
[14] I. Panagakis, E. Benetos, and C. Kotropoulos, “Music genre classification: A multilinear approach,” in Proc. Int. Society Music Inf. Retrieval, pp. 583-588, 2008.
[15] C.-H. Lee, J.-L. Shih, K.-M. Yu and H.-S. Lin, “Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features,” IEEE Trans. Multimedia, vol. 11, no. 4, pp. 670-682, 2009.
[16] Y. Panagakis, C. Kotropoulos, and G. R. Arce, “Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 3, pp. 576-588, 2010.
[17] Z. Fu, G. Lu, K. M. Ting, and D. Zhang, “A Survey of Audio-Based Music Classification and Annotation,” IEEE Trans. Multimedia, vol. 13, no. 2, pp.303-319, April 2011.
[18] G. Peeters, “A large set of audio features for sound description (similarity and classification) in the CUIDADO project,” CUIDADO I.S.T Project Report, 2004.
[19] F. Mörchen, A. Ultsch, M. Thies, and I. Löhken, “Modeling timbre distance with temporal statistics from polyphonic music,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 81–90, Jan. 2006.
[20] W. A. Sethares, R. D. Robin, and J. C. Sethares, “Beat tracking of musical performance using low-level audio feature,” IEEE Trans. Speech Audio Process., vol. 13, no. 2, pp. 275–285, Mar. 2005.
[21] Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H. H. Chen, “A regression approach to music emotion recognition,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 2, pp. 448–457, Feb. 2008.
[22] H. G. Kim, N. Moreau, and T. Sikora, “Audio classification based on MPEG-7 spectral basis representation,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 5, pp. 716–725, May 2004.
[23] Y. Panagakis and C. Kotropoulos, “Music genre classification via topology preserving non-negative tensor factorization and sparse representations, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., pp. 249-252, 2010.
[24] M. Casey and M. Slaney, “The importance of sequences in musical similarity,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., pp.V5-8, 2006.
[25] M. Li and R. Sleep, “Genre classification via an LZ78-based string kernel,” in Proc. Int. Society Music Inf. Retrieval, pp. 252-259, 2005.
[26] M. Li and R. Sleep, “A robust approach to sequence classification,” in Proc. IEEE Int. Conf. Tools with Artificial Intell., pp. 197-201, 2005.
[27] T. Langlois and G. Marques, “A music classification method based on timbral features,” in Proc. Int. Society Music Inf. Retrieval, pp. 81-85, 2009.
[28] K. Chen, S. Gao, Y. Zhu, and Q. Sun, “Music genre classification using text categorization method,” in Proc. MMSP, pp. 221-224, 2006.
[29] J. Paulus and A. Klapuri, “Music structure analysis by finding repeated parts,” in Proc. ACM Multimedia, pp. 59-67, 2006.
[30] A. G. Krishna and T. V. Sreenivas, “Music instrument recognition: from isolated notes to solo phrases,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., pp. 265-268, 2004.
[31] J. Reed and C.-H. Lee, “A study of music genre classification based on universal acoustic models,” in Proc. Int. Society Music Inf. Retrieval, pp. 89-94, 2006.
[32] A. Sheh and D. P. W. Ellis, “Chord segmentation and recognition using EM-trained hidden Markov models,” in Proc. Int. Society Music Info. Retrieval, pp. 183-189, 2003.
[33] J.-M. Ren, Z.-S. Chen, and J.-S. R. Jang, “On the use of sequential patterns mining as temporal features for music genre classification,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., pp. 2294-2297, 2010.
[34] J.-M. Ren and J.-S. R. Jang, “Time-constrained sequential pattern discovery for music genre classification,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., pp. 173-176, 2011.
[35] L. Lu, M. Wang, and H.-J. Zhang, “Repeating pattern discovery and structure analysis from acoustic music data,” in Proc. ACM MIR, pp. 275-282, 2004.
[36] S.-C. Chiu, M.-K. Shan, J.-L. Huang, and H.-F. Li, “Mining polyphonic repeating patterns from music data using bit-string based approaches,” in Proc. IEEE Int. Conf. Multimedia and Expo., pp. 1170-1173, 2009.
[37] M.-Y Lin, S.-C. Hsueh, M.-H. Chen, and H.-Y. Hsu, “Mining sequential patterns for image classification in ubiquitous multimedia systems,” in Proc. IEEE Int. Conf. Intelligent Info. Hiding and Multimedia Signal Process., pp. 303-306, 2009.
[38] J.-G. Lee, J. Han, X. Li, and H. Cheng, “Mining discriminative patterns for classifying trajectories on road networks,” IEEE Trans. Knowledge Data Engineering, vol. 23, no. 5, pp. 713-726, 2011.
[39] T. P. Exarchos, M. G. Tsipouras, C. Papaloukas, and D. I. Fotiadis, “A two-stage methodology for sequence classification based on sequential pattern mining and optimization,” Data and Knowledge Engineering, vol. 66, pp. 467-487, 2008.
[40] B. Ma, H. Li, and C.-H. Lee, “An acoustic segment modeling approach to automatic language identification,” in Proc. Interspeech, pp. 2829-2831, 2005.
[41] S. Dixon, “Onset Detection Revised,” in Proc. Int. Conf. Digital Audio Effects, pp. DAFX-1-DAFX6, 2006.
[42] M.-Y. Lin, S.-C. Hsueh, and C.-W. Chang, “Fast discovery of sequential patterns in large databases using effective time indexing,” Inform. Sci., vol. 178, no. 22, pp. 4228-4245, 2008.
[43] Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text categorization,” in Proc. Int. Conf. Mach. Learn., pp. 412-420, 1997.
[44] H. Cheng, X. Yan, J. Han, and C.-W. Hsu, “Discriminative frequent pattern analysis for effective classification,” in Proc. IEEE Int. Conf. Data Engineering, pp. 716-725, 2007.
[45] G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing and Management, vol. 24, no. 5, pp. 513-523, 1988.
[46] C.-C. Chang and C.-J. Lin, “LIBSVM: a library for support vector machine,” 2010 [Online], Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[47] J. Han and M. Kamber, “Data Mining: Concepts and Techniques,” 2nd ed. Morgan Kaufmann, 2006.
[48] [Online], Available: http://ismir2004.ismir.net/genre_contest/index.htm.
[49] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A Practical guide to support vector classification,” Dept. of Comput. Sci., National Taiwan Univ., 2003, Tech. Rep.
[50] M.L. Berenson, D.M. Levine, and M. Goldstein, Intermediate statistical methods and applications: a computer package approach, Prentice-Hall, 1983.
[51] E. Benetos and C. Kotropoulos, “Non-negative tensor factorization applied to music genre classification,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 1955-1967, 2010.
[52] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 2, pp. 210-227, 2009.
[53] S.-C. Lim, S.-J. Jang, S.-P. Lee, and M. Y. Kim, “Music genre/mood classification using a feature-based modulation spectrum,” in Proc. Int. Conf. Mobile IT Convergence, pp. 133-136, 2011.
[54] M.-J. Wu and J.-S. R. Jang, “MIREX 2012 submission: combining acoustic and multi-level visual features for music genre classification,” Online: http://www.music-ir.org/mirex/abstracts/2012/WJ1.pdf.
[55] M.-J. Wu and J.-M. Ren, “ MIREX 2011 submission: combining visual and acoustic features for music genre classification,” Online: http://www.music-ir.org/mirex/abstracts/2011/WR1.pdf.
[56] K. Seyerlehner, M. Schedl, T. Pohle, and P. Knees, “Using block-level features for genre classification, tag classification and music similarity estimation,” Online: http://www.music-ir.org/mirex/abstracts/2010/SSPK1.pdf.
[57] C. Cao and M. Li, “Thinkit’s submission for MIREX 2009 audio music classification and similarity tasks,” Online: http://www.music-ir.org/mirex/abstracts/2009/CL.pdf.
[58] G. Tzanetakis, “Marsyas submissions to MIREX 2007,” Online: http://www.music-ir.org/mirex/abstracts/2008/mirex2007.pdf.
[59] R. Panda and R. P. Paiva, “MIREX 2012: mood classification tasks submission,” Online: http://www.music-ir.org/mirex/abstracts/2012/PP5.pdf.
[60] J.-C. Wang, H.-Y. Lo, S.-K. Jeng, and H.-M. Wang, “MIREX 2010: audio classification using semantic transformation and classifier ensemble,” Online: http://www.music-ir.org/mirex/abstracts/2010/WLJW1.pdf.
[61] G. Peeters, “A generic training and classification system for MIREX08 classification tasks: audio music mood, audio genre, audio artist, and audio tag,” Online: http://www.music-ir.org/mirex/abstracts/2008/Peeters_2008_ISMIR_MIREX.pdf.
[62] Y.-C. Lin, Y.-H Yang, H.H. Chen, I.-B. Liao, and Y.-C. Ho, “Exploiting genre for music emotion classification”, in Proc. IEEE Int. Conf. Multimedia and Expo, 2009, pp. 618–621.
[63] S. Sukittanon, L. E. Atlas, and J. W. Pitton, “Modulation-scale analysis for content identification,” IEEE Trans. Signal Process., vol. 52, no. 10, pp.3023-3035, Oct. 2004.
[64] B. Kingsbury, N. Morgan, and S. Greenberg, “Robust speech recognition using the modulation spectrogram,” Speech Commun., vol. 25, no. 1, pp. 117–132, 1998.
[65] T. Kinnunen, “Joint acoustic-modulation frequency for speaker recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, 2006, vol. 1, pp. 14–19.
[66] S. Ewert and T. Dau, “Characterizing frequency selectivity for envelope fluctuations,” J. Acoust. Soc. Amer., vol. 108, pp. 1181–1196, 2000.
[67] K. Seyerlehner, “Content-based music recommender systems: beyond simple frame-level audio similarity,” Ph.D. dissertation, Dept. Computational Perception, Johannes Kepler Univ., Linz, Austria, 2010.
[68] T. Eerola and J. K. Vuoskoski, “A comparison of the discrete and dimensional models of emotion in music, “ Psychology of Music, vol. 39, no. 1, pp. 18-49, 2011.
[69] Y. Song, S. Dixon, and M. Pearce, “Evaluation of musical features for emotion classification,” in Proc. Int. Society Music Info. Retrieval, 2012, pp. 523-528.
[70] Y. Panagakis and C. Kotropoulos, “Automatic music mood classification via low-rank representation,” in the European Association for Signal Process., 2011, pp. 689-693.
[71] X. Hu, J. S. Downie, C. Laurier, M. Bay, and A. F. Ehmann, “The 2007 MIREX audio mood classification task: lessons learning,” in Proc. Int. Society Music Info. Retrieval, 2008, pp. 462-467.
[72] C.-C. Yeh and Y.-H. Yang, “Supervised dictionary learning for music genre classification,” in Prof. Int. Conf. Multimedia Retrieval, 2012, article no. 55.
[73] J.-M. Ren and J.-S. R. Jang, “Discovering time-constrained sequential patterns for music genre classification,” IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 4, pp. 1134–1144, May 2012.
[74] J. Bergstra, M. Mandel and D. Eck, “Scalable genre and tag prediction with spectral covariance,” in Proc. Int. Society Music Info. Retrieval, 2010, pp. 507-512.
[75] E. Tsunoo, G. Tzanetakis, N. Ono, and S. Sagayama, “Beyond timbral statistics: improving music classification using percussive patterns and bass lines,” IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 4, pp. 1003–1014, May 2011.
[76] B. L. Sturm and P. Noorzad, “On automatic music genre classification by sparse representation classification using auditory temporal modulations,” in Prof. Int. Sym. Computer Music Modeling Retrieval, 2012.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top