跳到主要內容

臺灣博碩士論文加值系統

(98.82.120.188) 您好!臺灣時間:2024/09/15 19:23
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:呂行
研究生(外文):Shing Hermes Lyu
論文名稱:利用結構性支撐向量機的具音樂表現能力之半自動電腦演奏系統
論文名稱(外文):A Semi-automatic Computer Expressive Music Performance System Using Structural Support Vector Machine
指導教授:鄭士康
指導教授(外文):Shyh-Kang Jeng
口試委員:陳宏銘王育雯王真儀
口試委員(外文):Homer Chen
口試日期:2014-06-05
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電機工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2014
畢業學年度:102
語文別:英文
論文頁數:69
中文關鍵詞:電腦自動演奏結構性支撐向量機支撐向量機
外文關鍵詞:Computer Expressive PerformancePerformance RenderingStructural SVMsSupport Vector Machines
相關次數:
  • 被引用被引用:0
  • 點閱點閱:993
  • 評分評分:
  • 下載下載:17
  • 收藏至我的研究室書目清單書目收藏:0
電腦合成的音樂一向被認為是僵硬、機械化而且沒有音樂表現能力。因此能夠產生具有表現能力的電腦自動演奏系統將會對音樂產業、個人化娛樂以及表驗藝術領域有重大的影響。在這篇論文中,我們藉由隱藏式馬可夫模型結構的結構性支撐向量機 (SVM-HMM) 來設計一個可以產生具有表現能力音樂的電腦自動演奏系統。我們邀請六位研究生錄製了克萊門蒂(Muzio Clementi)的小奏鳴曲集 Op.36。我們手動將這些錄音分割成樂句,並且利用程式從中抽取出音樂特徵。這些
音樂特徵藉由 SVM-HMM 訓練成數學模型後,可以利用這個數學模型來演奏訓練過程中沒有見過的樂譜(需要手動標注樂句)。此系統目前只能支援單音旋律。問卷調查的結果顯示,本系統產生的音樂尚不能達到真人的演奏水準。但是根據量化的相似度分析,本系統產生的音樂確實比無表現性的 MIDI 音樂更接近真人演奏。


Computer generated music is known to be robotic and inexpressive. A computer system that can generate expressive performance potentially has significant impact on music production industry, personalized entertainment or even art. In this paper, we have designed and implemented a system that can generate expressive performance using structural support vector machine with hidden Markov model output (SVM-HMM). We recorded six sets of
Muzio Clementi''s Sonatina Op.36 performed by six graduate students. The recordings and scores are manually split into phrases and had their musical features automatically extracted. Using the SVM-HMM algorithm, a mathematical model of expressive performance knowledge is learned from these features. The trained model can generate expressive performances for previously unseen scores (with user-assigned phrasings). The system currently supports monophonic music only. Subjective test shows that the computer generated performances still cannot achieve the same level of expressiveness of human performers, but quantitative similarity measures show that the computer generated performances are much similar to human performances than inexpressive MIDIs.


1 Introduction
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Goal and Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Chapter Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Previous Works
2.1 Various Goals and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Researches Classified by Methods Used . . . . . . . . . . . . . . . . . . 5
2.3 Additional Specialties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Proposed Method
3.1Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
3.2 A Brief Introduction to SVM-HMM . . . . . . . . . . . . . . . . . . . . 10
3.3 Learning Performance Knowledge . . . . . . . . . . . . . . . . . . . . . 16
3.3.1 Training Sample Loader . . . . . . . . . . . . . . . . . . . . . . 16
3.3.2 Features Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.3 SVM-HMM Learning . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Performing Expressively . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.1 SVM-HMM Generation . . . . . . . . . . . . . . . . . . . . . . 20
3.4.2 MIDI Generation and Synthesis . . . . . . . . . . . . . . . . . . 20
3.5 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5.1 Score Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5.2 Performance Features . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.3 Normalizing Onset Deviation . . . . . . . . . . . . . . . . . . . 24
4 Corpus Preparation
4.1 Existing Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Corpus Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.1 Score Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.2 MIDI Recording . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.3 MIDI Cleaning and Phrase Splitting . . . . . . . . . . . . . . . . 31
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5 Experiments
5.1 Onset Deviation Normalization . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2.1 SVM-HMM-related Parameters . . . . . . . . . . . . . . . . . . 40
5.2.2 Quantization Parameter . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Human-like Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6 Conclusions
Bibliography 53
A Software Tools Used in This Research 63


[1] R. Hiraga, R. Bresin, K. Hirata, and H. Katayose, “Rencon 2004: Turing test for musical expression ,” in Proceedings of the 2004 conference on New Interfaces for Musical Expression (NIME ''04) (Y. Nagashima and M. Lyons, eds.), (Hamatsu, Japan),pp. 120--123, ACM Press, 2004.
[2] “Finale [Computer Software].” http://www.finalemusic.com/. [Online;accessed 2014-05-20].
[3] “Sibelius [Computer Software].” http://www.avid.com/us/products/sibelius/pc/Play-perform-and-share.[Online; accessed 2014-05-20].
[4] “Rachmianinoff - Plays Rachmaninoff [CD].” Zenph Music, 2009.
[5] “Cadenza [Computer Software].” http://www.sonation.net/. [Online; accessed 2014-05-20].
[6] A. Kirke and E. R. Miranda, “An Overview of Computer Systems for Expressive Music Performance,” in Guide to Computing for Expressive Music Performance (A. Kirke and E. R. Miranda, eds.), pp. 1--47, Springer, 2013.
[7] A. Friberg, R. Bresin, and J. Sundberg, “Overview of the KTH rule system for musical performance,” Advances in Cognitive Psychology, vol. 2, pp. 145--161, Jan. 2006.
[8] M. Hashida, N. Nagata, and H. Katayose, “Pop-E: a performance rendering system for the ensemble music that considered group expression,” in Proceedings of 9th International Conference on Music Perception and Cognition (M. Baroni, R. Addessi,R. Caterina, and M. Costa, eds.), (Bologna, Spain), pp. 526--534, ICMPC, 2006.
[9] S. R. Livingstone, R. M&;#252;hlberger, A. R. Brown, and A. Loch, “Controlling musical
emotionality: an affective computational architecture for influencing musical emotions,” Digital Creativity, vol. 18, pp. 43--53, Mar. 2007.
[10] N. P. M. Todd, “A computational model of rubato,” Contemporary Music Review, vol. 3, pp. 69--88, Jan. 1989.
[11] N. P. McAngus Todd, “The dynamics of dynamics: A model of musical expression,”The Journal of the Acoustical Society of America, vol. 91, p. 3540, June 1992.
[12] N. P. M. Todd, “The kinematics of musical expression,” The Journal of the Acoustical Society of America, vol. 97, p. 1940, Mar. 1995.
[13] M. Clynes, “Generative principles of musical thought: Integration of microstructure with structure,” Journal For The Integrated Study of Artificial Intelligence, 1986.
[14] M. Clynes, “Microstructural musical linguistics: composers'' pulses are liked most by the best musicians,” Cognition, 1995.
[15] M. Johnson, “Toward an expert system for expressive musical performance,” Computer, vol. 24, pp. 30--34, July 1991.
[16] R. B. Dannenberg and I. Derenyi, “Combining instrument and performance models for high-quality music synthesis,” Journal of New Music Research, vol. 27, pp. 211--238, Sept. 1998.
[17] R. B. Dannenberg, H. Pellerin, and I. Derenyi, “A Study of Trumpet Envelopes,” in Proceedings of the 1998 international computer music conference (O. 1998, ed.), (Ann Arbor, Michigan), pp. 57--61, International Computer Music Association, 1998.
[18] G. Mazzola and O. Zahorka, “Tempo curves revisited: Hierarchies of performance fields,” Computer Music Journal, vol. 18, no. 1, pp. 40--52, 1994.
[19] G. Mazzola, The Topos of Music: Geometric Logic of Concepts, Theory, and Performance. Basel/Boston: Birkh&;#228;user, 2002.
[20] W. A. Sethares, Tuning, Timbre, Spectrum, Scale. Springer, 2005.
[21] H. Katayose, T. Fukuoka, K. Takami, and S. Inokuchi, “Expression extraction in virtuoso music performances,” in Proceedings of 10th International Conference on Pattern Recognition, vol. I, pp. 780--784, IEEE Computer Society Press, 1990.
[22] H. Katayose, T. Fukuoka, K. Takami, and S. Inokuchi, “Extraction of expression parameters with multiple regression analysis,” Journal of Information Processing Society of Japan, no. 38, pp. 1473--1481, 1997.
[23] O. Ishikawa, Y. Aono, H. Katayose, and S. Inokuchi, “Extraction of Musical Performance Rules Using a Modified Algorithm of Multiple Regression Analysis,” in International Computer Music Conference Proceedings, (Berlin, Germany), pp. 348--351, International Computer Music Association, San Francisco, 2000.
[24] S. Canazza, G. De Poli, C. Drioli, A. Rod&;#224;, and A. Vidolin, “Audio Morphing Different Expressive Intentions for Multimedia Systems,” IEEE Multimedia, vol. 7, pp. 79--83, July 2000.
[25] S. Canazza, A. Vidolin, G. De Poli, C. Drioli, and A. Rod&;#224;, “Expressive Morphing for Interactive Performance of Musical Scores,” in Proceedings of 1st International Conference on Web Delivering of Music, p. 116, IEEE Computer Society, Nov. 2001.
[26] S. Canazza, G. De Poli, A. Rod&;#224;, and A. Vidolin, “An Abstract Control Space for Communication of Sensory Expressive Intentions in Music Performance,” Journal of New Music Research, vol. 32, pp. 281--294, Sept. 2003.
[27] R. Bresin, “Artificial neural networks based models for automatic performance of musical scores,” Journal of New Music Research, vol. 27, pp. 239--270, Sept. 1998.
[28] A. Camurri, R. Dillon, and A. Saron, “An experiment on analysis and synthesis of musical expressivity,” in Proceedings of 13th colloquium on musical informatics, (L''Aquila, Italy), 2000.
[29] G. Grindlay, Modeling expressive musical performance with Hidden Markov Models. PhD thesis, University of Santa Cruz, CA, 2005.
[30] C. Raphael, “Can the computer learn to play music expressively?,” in Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics (T. Jaakkola and T. Richardson, eds.), pp. 113--120, Morgan Kaufmann, San Francisco, 2001.
[31] C. Raphael, “A Bayesian Network for Real-Time Musical Accompaniment.,” Neural Information Processing Systems, no. 14, pp. 1433--1440, 2001.
[32] C. Raphael, “Orchestra in a box: A system for real-time musical accompaniment,” in Proceedings of 2003 International Joint conference on Artifical Intelligence (Working Notes of IJCAI-03 Rencon Workshop) (G. Gottob and T. Walsh, eds.), (Acapulco,Mexico), pp. 5--10, Morgan Kaufmann, San Francisco, 2003.
[33] L. Dorard, D. Hardoon, and J. Shawe-Taylor, “Can style be learned? A machine learning approach towards ‘performing’as famous pianists.,” in Proceedings of the Music, Brain and Cognition Workshop -- Neural Information Processing Systems,Whistler, Canada, 2007.
[34] M. Wright and E. Berdahl, “Towards machine learning of expressive microtiming in Brazilian drumming,” in Proceedings of the 2006 International Computer Music Conference (I. Zannos, ed.), (New Orleans, USA), pp. 572--575, ICMA, San Francisco, 2006.
[35] R. Ramirez and A. Hazan, “Modeling Expressive Music Performance in Jazz.,” in Proceedings of 18th international Florida Artificial Intelligence Research Society Sonference (AI in Music and Art), (Clearwater Beach, FL, USA), pp. 86--91, AAAI Press, Menlo Park, 2005.
[36] R. Ramirez and A. Hazan, “Inducing a generative expressive performance model
using a sequential-covering genetic algorithm,” in Proceedings of 2007 annual conference on Genetic and evolutionary computation, (London, UK), ACM Press, New York, 2007.
[37] Q. Zhang and E. Miranda, “Towards an evolution model of expressive music perfor mance,” in Proceedings of the 6th International Conference on Intelligent Systems Design and Applications (Y. Chen and A. Abraham, eds.), (Jinan, China), pp. 1189--1194, IEEE Computer Society, Washington, DC, 2006.
[38] E. Miranda, A. Kirke, and Q. Zhang, “Artificial evolution of expressive performance of music: An imitative multi-agent systems approach,” Computer Music Journal, vol. 34, no. 1, pp. 80--96, 2010.
[39] Q. Zhang and E. R. Miranda, “Evolving Expressive Music Performance through Interaction of Artificial Agent Performers,” in Proceedings of ECAL 2007 workshop on music and artificial life (MusicAL 2007), (Lisbon, Portugal), 2007.
[40] J. L. Arcos, R. L. De M&;#225;ntaras, and X. Serra, “SaxEx: a case-based reasoning system for generating expressive musical performances,” in Proceedings of 1997 International Computer Music Conference (P. Cook, ed.), (Thessalonikia, Greece), pp. 329--336, ICMA, San Francisco, 1997.
[41] J. L. Arcos, R. L. De M&;#225;ntaras, and X. Serra, “SaxEx: A case-based reasoning system for generating expressive musical performances,” Journal of New Music Research, vol. 27, no. 3, pp. 194--210, 1998.
[42] J. L. Arcos and R. L. De M&;#225;ntaras, “An Interactive Case-Based Reasoning Approach for Generating Expressive Music,” Journal of Applied Intelligence, vol. 14, pp. 115--129, Jan. 2001.
[43] T. Suzuki, T. Tokunaga, and H. Tanaka, “A case based approach to the generation of musical expression,” in Proceedings of the 16th International Joint Conference on Artificial Intelligence, (Stockholm, Sweden), pp. 642--648, Morgan Kaufmann, San Francisco, 1999.
[44] T. Suzuki, “Kagurame phase-II,” in Proceedings of 2003 International Joint Conference on Artificial Intelligence (working Notes of RenCon Workshop) (G. Gottlob and T. Walsh, eds.), (Acapulco, Mexico), Morgan Kaufmann, Los Altos, 2003.
[45] K. Hirata and R. Hiraga, “Ha-Hi-Hun: Performance rendering system of high controllability,” in Proceedings of the ICAD 2002 Rencon Workshop on performance rendering systems, (Kyoto, Japan), pp. 40--46, 2002.
[46] G. Widmer, “Large-scale Induction of Expressive Performance Rules: First Quantitative Results,” in Proceedings of the 2000 International Computer Music Conference (I. Zannos, ed.), (Berlin, Germany), pp. 344--347, International Computer Music Association, San Francisco, 2000.
[47] G. Widmer and A. Tobudic, “Machine discoveries: A few simple, robust local expression principles,” Journal of New Music Research, vol. 32, pp. 259--268, 2002.
[48] G. Widmer, “Discovering simple rules in complex data: A meta-learning algorithm and some surprising musical discoveries,” Artificial Intelligence, vol. 146, pp. 129--148, 2003.
[49] G. Widmer and A. Tobudic, “Playing Mozart by Analogy: Learning Multi-level Timing and Dynamics Strategies,” Journal of New Music Research, vol. 32, pp. 259--268, Sept. 2003.
[50] A. Tobudic and G. Widmer, “Relational IBL in music with a new structural similarity measure,” in Proceedings of the 13th International Conference on Inductive Logic Programming (T. Horvath and A. Yamamoto, eds.), pp. 365--382, Springer Verlag, Berlin, 2003.
[51] A. Tobudic and G. Widmer, “Learning to play Mozart: Recent improvements,” in Proceedings of 2003 International Joint conference on Artifical Intelligence (Working Notes of IJCAI-03 Rencon Workshop) (K. Hirata, ed.), (Acapulco, Mexico), 2003.
[52] P. Dahlstedt, “Autonomous evolution of complete piano pieces and performances,”in Proceedings of ECAL 2007 workshop on music and artificial life (Music AL 2007), (Lisbon, Portugal), 2007.
[53] A. Kirke and E. Miranda, “Using a biophysically-constrained multi-agent system to combine expressive performance with algorithmic composition,” 2008.
[54] L. Carlson, A. Nordmark, and R. Wikilander, Reason version 2.5 -- Getting Started. Propellerhead Software, 2003.
[55] Y.-H. Kuo, W.-C. Chang, T.-M. Wang, and A. W. Su, “TELPC BASED RESYNTHESIS METHOD FOR ISOLATED NOTES OF POLYPHONIC INSTRUMENTAL MUSIC RECORDINGS,” in Proceedings of the 16th International Conference on Digital Audio Effects (DAFx-13), (Maynooth, Ireland), pp. 1--6, 2013.
[56] T. Joachims, T. Finley, and C.-N. J. Yu, “Cutting-plane training of structural SVMs,”Machine Learning, vol. 77, pp. 27--59, May 2009.
[57] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, “Large Margin Methods for Structured and Interdependent Output Variables,” Journal of Machine Learning Research, vol. 6, pp. 1453--1484, 2005.
[58] Y. Altun, I. Tsochantaridis, and T. Hofmann, “Hidden Markov Support Vector Machines,” in Proceedings of the 20th International Conference on Machine Learning, vol. 3, (Washington DC, USA), pp. 3--10, 2003.
[59] M. Cuthbert and C. Ariza, “music21 [computer software].” http://web.mit.edu/music21/. [accessed 2014-05-20].
[60] MIDI Manufacturers Association, “The Complete MIDI 1.0 Detailed Specification.”http://www.midi.org/techspecs/midispec.php. [Online; accessed 2014-05-20].
[61] S. H. Lyu and S.-K. Jeng, “COMPUTER EXPRESSIVE MUSIC PERFORMANCE BY PHRASE-WISE MODELING,” in Workshop on Computer Music and Audio Technology, 2012.
[62] T. Joachims, “SVM&;#710;hmm: Sequence Tagging with Structural Support Vector Machines.”
http://www.cs.cornell.edu/people/tj/svm_light/svm_hmm.html. [Online; accessed 2014-05-20].
[63] “MusicXML 3.0 Specification.”http://www.musicxml.com/for-developers/. [Online; accessed 2014-05-20].
[64] R. P. Brent, Algorithms for Minimization Without Derivatives. 2013.
[65] M. Hashida, T. Matsui, and H. Katayose, “A New Music Database Describing Deviation Information of Performance Expressions,” in International Conference of Music Information Retrival (ISMIR), pp. 489--494, 2008.
[66] S. Flossmann, W. Goebl, M. Grachten, B. Niedermayer, and G. Widmer, “The Magaloff project: An interim report,” Journal of New Music Research, vol. 39, no. 4, pp. 363--377, 2010.
[67] W. Goebl, S. Flossmann, and G. Widmer, “Computational investigations into between-hand synchronization in piano playing: Magaloff''s complete Chopin,” in Proceedings of the Sixth Sound and Music Computing Conference, pp. 291--296, 2009.
[68] M. Grachten and G. Widmer, “Explaining musical expression as a mixture of basis functions,” in Proceedings of the 8th Sound and Music Computing Conference (SMC2011), 2011.
[69] S. Flossmann, W. Goebl, and G. Widmer, “Maintaining skill across the life span: Magaloff''s entire Chopin at age 77,” in Proceedings of the International Symposium on Performance Science, 2009.
[70] M. Grachten and G. Widmer, “Linear basis models for prediction and analysis of musical expression,” Journal of New Music Research, 2012.
[71] S. Flossmann, M. Grachten, and G. Widmer, “Expressive performance rendering with probabilistic models,” in Guide to Computing for Expressive Music Performance (A. Kirke and E. R. Miranda, eds.), pp. 75--98, Springer London, 2013.
[72] S. Flossman and G. Widmer, “Toward a model of performance errors: A qualitative review of Magaloff''s Chopin,” in International Symposium on Performance Science, (Utrecht), AEC, 2011.
[73] S. Flossmann, W. Goebl, and G. Widmer, “The Magaloff corpus: An empirical error study,” in Proceedings of the 11th ICMPC, (Seattle, Washington, USA), 2010.
[74] G. Widmer, S. Flossmann, and M. Grachten, “YQX Plays Chopin,” AI Magazine, vol. 30, p. 35, July 2009.
[75] F. Lerdahl and R. S. Jackendoff, A Generative Theory of Tonal Music. 1983.
[76] “KernScores.” http://kern.ccarh.org/. [Online; accessed 2014-05-20].
[77] M. Clementi, SONATINES pour Piano a 2 mains Op. 36 VOLUME I [Musical Score]. Paris: Durand &; Cie., plate d. &; c. 9318 ed., 1915.
[78] P. Eggert, M. Haertel, D. Hayes, R. Stallman, and L. Tower, “diff [Computer Program].”
[79] M. Good, “MusicXML: An Internet-Friendly Format for Sheet Music,” in XML Conference hosted by IDEAlliance, 2001.
[80] E. Selfridge-Field, Beyond MIDI: The Handbook of Musical Codes. MIT Press, 1997.
[81] “LilyPond.” http://www.lilypond.org. [Online; accessed 2014-05-20].


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top