跳到主要內容

臺灣博碩士論文加值系統

(44.220.255.141) 您好!臺灣時間:2024/11/13 09:55
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林鑫彤
研究生(外文):Lin, Chin-Tung
論文名稱:利用卷積式注意力機制語言模型為影片生成鋼琴樂曲
論文名稱(外文):InverseMV: Composing Piano Scores with a Convolutional Video-Music Transformer
指導教授:沈錳坤沈錳坤引用關係
指導教授(外文):Shan, Man-Kwan
口試委員:劉昭麟吳建銘鄭麗珍
口試委員(外文):Liu, Chao-LinWu, Jiann-Ming
口試日期:2019-07-31
學位類別:碩士
校院名稱:國立政治大學
系所名稱:資訊科學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:28
中文關鍵詞:為影片生成音樂音樂生成卷積式注意力機制模型生成鋼琴譜影片配樂
外文關鍵詞:Video-Music TransformerVMTInverseMVVMT ModelConvolutional Video-Music Transformer
相關次數:
  • 被引用被引用:0
  • 點閱點閱:567
  • 評分評分:
  • 下載下載:54
  • 收藏至我的研究室書目清單書目收藏:0
近年手機鏡頭的技術趨向成熟,加上如Facebook、Instagram等社群網站的興起,使用者可輕易用手機拍出高品質的照片及影片並分享到網路上。一個高流量的影片往往有著與之搭配的音樂,而一般人並非專業的配樂師,受限於音樂素材的收集和敏銳度,在影片配樂的挑選上時常遇到困難。影片的配樂上使用現成的音樂會受限於版權的問題,因此在影片配樂上使用音樂的自動生成將成為一個新的研究趨勢。
隨著近年類神經網路(Neural Network, NN)蓬勃的發展,有許多研究開始嘗試使用類神經網路模型來生成符號音樂(symbolic music),但據我們所知目前並未有人嘗試為影片生成音樂。在缺乏現成dataset的情況下,我們人工收集並標記一個pop music的dataset來做為我們模型的訓練資料。基於注意力機制模型(Transformer)在自然語言處理(Natural Language Processing, NLP)問題上的成功,而符號音樂的生成與語言生成也有著異曲同工之處,本研究提出一個為影片自動生成配樂的模型VMT(Video-Music Transformer),輸入影片的frame sequence來生成對應的符號鋼琴音樂(symbolic piano music)。我們在實驗結果也得到VMT模型相對於序列模型(sequence to sequence model)在音樂流暢度和影片匹配度上有較好的結果。
With the wide popularity of social media including Facebook, Twitter, Instagram, YouTube, etc. and the modernization of mobile photography, users on social media tend to watch and send videos rather than text. People want their video with a high click-through rate. However, such video requires great editing skill and perfect matching music, which are very difficult for common people. On top of that, people creating soundtrack suffer from the lack of ownership of musical pieces. The music generated from a model instead of existing music conduces to preventing from breaching copyright.
The rise of deep learning brought out much work using a model based on the neural network to generate symbolic music. However, to the best of our knowledge, there is no work trying to compose music for video and no dataset with paired video and music. Therefore, we release a new dataset composed of over 7 hours of piano scores with fine alignment between pop music videos and midi files. We propose a model VMT(Video-Music Transformer) that generates piano scores from video frames, and then evaluate our model with seq2seq and obtain better music smooth and relevance of video.
摘要 I
ABSTRACT II
目錄 III
LIST OF FIGURES V
LIST OF TABLES VI
第 1 章 緒論 1
1.1背景 1
1.2動機 1
1.3研究方法 2
1.4 研究貢獻 3
第 2 章 相關研究 4
2.1 背景音樂推薦 4
2.2 自動音樂作曲 4
2.3 深度學習 6
第 3 章 研究方法 9
3.1資料前處理 9
3.2 CONVOLUTIONAL VIDEO-MUSIC TRANSFORMER 9
3.3 SEQ2SEQ (BASELINE) 12
第 4 章 資料集 14
4.1資料收集處理 14
4.2影片音樂對齊 14
4.3資料集介紹 15
第 5 章 實驗設計 16
5.1模型訓練 16
5.2評估方法 17
5.3實驗結果 18
5.3.1 User Bias 19
5.3.2 Problem of Seq2seq 20
第 6 章 總結 23
參考文獻 24
附錄 26
[1] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[2] H.-W. Dong, W.-Y. Hsiao, L.-C. Yang, and Y.-H. Yang, MuseGAN: Symbolic-domain music generation and accompaniment with multi-track sequential generative adversarial networks. arXiv preprint arXiv:1709.06298, 2017.
[3] J. Engel, C. Resnick, A. Roberts, S. Dieleman, M. Norouzi, D. Eck, and K. Simonyan, Neural audio synthesis of musical notes with wavenet autoencoders. Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017.
[4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets. Advances in neural information processing systems, 2014.
[5] G. Hadjeres, F. Pachet, and F. Nielsen, DeepBach: a Steerable Model for Bach chorales generation. arXiv preprint arXiv:1612.01010, 2016.
[6] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, and T. N. Sainath, Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82-97, 2012.
[7] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 2012.
[9] F.-F. Kuo, M.-F. Chiang, M.-K. Shan, and S.-Y. Lee, Emotion-based music recommendation by association discovery from film music. Proceedings of the 13th annual ACM international conference on Multimedia, 2005.
[10] J.-C. Lin, W.-L. Wei, and H.-M. Wang, EMV-matchmaker: emotional temporal course modeling and matching for automatic music video generation. Proceedings of the 23rd ACM international conference on Multimedia, 2015.
[11] O. Mogren, C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arXiv preprint arXiv:1611.09904, 2016.
[12] A. V. D. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
[13] S. Oore, I. Simon, S. Dieleman, D. Eck, and K. Simonyan, This time with feeling: learning expressive musical performance. Neural Computing and Applications, 1-13, 2018.
[14] P. M. Todd, A connectionist approach to algorithmic composition. Computer Music Journal, 13(4), 27-43, 1989.
[15] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all you need. Advances in neural information processing systems, 2017.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top