臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.14) 您好！臺灣時間：2025/12/02 01:36

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

劉至峻

研究生(外文):

Chih-Chun Liu

論文名稱:

使用多模型合併之深度學習應用於音樂片段人聲辨識

論文名稱(外文):

Deep Learning Algorithm Using Multi-model Combination Applied to Singing Voice Detection

指導教授:

劉建宏

、尤信程

指導教授(外文):

Chien-Hung Liu、Shing-Chern You

口試委員:

蔡偉和、張寶基、劉建宏、尤信程

口試日期:

2018-07-12

學位類別:

碩士

校院名稱:

國立臺北科技大學

系所名稱:

資訊工程系

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2018

畢業學年度:

106

語文別:

中文

論文頁數:

中文關鍵詞:

整體式學習、遞歸神經網絡、卷積層神經網路、音樂特徵、類神經網路、深度學習

外文關鍵詞:

Ensemble Learning、Recurrent Neural Network、Convolutional Neural Network、Vocal Detection、Neural Network、Deep Learning

相關次數:

被引用:5
點閱:623
評分:
下載:106
書目收藏:0

使用機器分類出一段音樂是否有人聲，一直都是很重要的問題。之前有研究顯示，直接將快速傅立葉轉換(Fast Fourier Transformation)後的頻譜絕對值(spectral magnitude value)，直接送入卷積層神經網路(convolutional neural networks, CNN)進行訓練，其準確率達到92%左右。為進一步探討增進準確率的方法，本論文試著利用整體式學習（ensemble learning）的技巧，合併CNN與其他類神經網路架構，如長短期記憶網路(Long Short Term Memory, LSTM)、卷積長短期記憶網路(Convolutional LSTM)和膠囊網路(Capsule Networks)等，以擷取各模型架構的優點，看準確率能否有所突破。本論文所使用的合併的方式，包含：投票(Voting)、融合(Fusion)及後分類器(Post Classification)，個別比較其準確率之差異。本論文所使用的音樂資料集除了Jamendo音樂資料庫外，也使用自建的資料集來驗證方法的有效性。當本論文使用Jamendo資料集並將多重架構做Voting及Post Classification合併後，最高平均準確率可達94.2%，比單一架構的準確率高；當使用自建的資料集時，合併後之準確率也普遍優於單一模型之最高準確率。因此，本論文確認整體式學習在人聲辨識上，有其效果．

Detecting the vocal sound in a piece of audio is a fundamental step to many advanced audio processing techniques. Previously, one study showed that good accuracy of 92% could be achievable for this problem by using the convolutional neural networks (CNN) using spectrogram as the input features. To explore the possibilities of further performance improvement, in this thesis we attempted to incorporate CNN and other neural network architectures, such as Long Short Term Memory (LSTM), Convolutional LSTM, and Capsule Networks, into ensemble learning. The ensemble learning approaches studied in this thesis includeed voting, fusion, and post classification, and the accuracy of each individual approach was reported. Regarding to the training/testing dataset, in addition to the well-known Jamendo dataset, we also built in-house datasets to validate the studied approaches. When using the Jamendo dataset, the average accuracy achieved 94.2% by using voting or post classification approach. This figure is higher than that of using any single architecture. When tested with the in-house datasets, voting or post classification approach also yielded better accuracy than a single model could achieve. Overall, this thesis confirmed that the ensemble learning was effective in terms of accuracy for the vocal detection problem.

摘要 i
致謝 iii
目錄 iv
表目錄 vii
圖目錄 viii
第一章緒論 1
1.1. 研究動機與目的 1
1.2. 名詞解釋 1
1.3. 論文的組織架構 2
第二章相關研究與背景介紹 3
2.1. 人工神經網路 3
2.1.1. 單層神經元網路 3
1. 多層神經元網路 4
2.2. 深度學習(Deep Learning) 5
2.2.1. 卷積神經網路(Convolutional Nerual Network, CNN) 5
2.2.2. 遞歸神經網絡（Recurrent Neural Network, RNN） 7
2.3. Tensorflow深度學習框架 9
2.4. Keras 9
2.5. Jamendo音樂資料庫 10
2.5.1. Corpus sets 10
2.5.2. 真實參考標準(Ground-Truth) 10
2.6. FMA音樂資料庫 10
2.7. 本論文之相關文獻探討 11
第三章深度學習音樂人聲辨識 13
3.1. MCNN：MFCC使用CNN深度學習架構 13
3.2. CapsNet (Capsule Networks) 深度學習架構 14
3.3. STSA (Short-Time Spectral Amplitude) 15
3.4. SCNN：STSA使用CNN深度學習架構 16
3.5. SLSTM ：STSA使用LSTM深度學習架構 16
3.6. SConvLSTM：STSA使用ConvLSTM深度學習架構 17
3.7. Combination及Fusion 17
3.7.1. Combination：Voting 18
3.7.2. Fusion 18
3.7.3. Combination：Post Classification 19
第四章系統實作與實驗 20
4.1. 深度學習系統環境與架構 21
4.2. 音樂資料預處理系統 21
4.2.1. Jamendo音樂訓練資料 21
4.2.2. FMA-C-1 Dataset 22
4.2.3. Test Hard Dataset 22
4.3. Voting & Post Classification資料處理 22
4.3.1. 驗證模型準確率 23
4.3.2. Voting 資料處理 23
4.3.3. Post Classification 資料處理 23
4.4. 各模型架構深度學習測試準確率 24
4.4.1. 實驗一：使用Jamendo 資料集訓練及測試 24
4.4.2. 實驗二：使用FMA-C-1資料集訓練及測試 26
4.4.3. 實驗三：使用Jamendo及FMA-C-1權重測試Test Hard資料集 28
4.5. Combination：Voting 法之準確率 28
4.5.1. 實驗四: Jamendo Test資料集使用不同權重及不同模型架構Voting 28
4.5.2. 實驗五：FMA-C-1 Test資料集使用不同權重及不同模型架構Voting 30
4.5.3. 實驗六：Test Hard資料集使用不同權重及不同模型架構Voting 31
4.6. Fusion法之準確率 32
4.6.1. 實驗七：Jamendo資料集將不同模型架構Fusion訓練 32
4.6.2. 實驗八：FMA-C-1資料集將不同模型架構Fusion訓練 34
4.7. Combination：Post Classification法之準確率 36
4.7.1. 實驗九：Jamendo資料集Post Classification 36
4.7.2. 實驗十：FMA-C-1資料集Post Classification 39
4.7.3. 實驗十一： Post Classification測試Test Hard資料集 39
4.8. Jamendo及FMA-C-1權重交互驗證 40
4.8.1. 實驗十二：Jamendo及FMA-C-1權重交互驗證 40
4.9. 結果與討論 41
第五章結論與未來研究方向 42
5.1. 結論 42
5.2. 未來研究方向 42
參考文獻 44

[1] 黃泓鳴, "Singing Voice Detection Based on Deep Learning Algorithm," 國立臺北科技大學資訊工程所碩士論文, 2017.
[2] M. Ramona, G. Richard and B. David, "Vocal detection in music with support vector machines," ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., p. 1885–1888, 2008.
[3] S. Sabour and N. F. E. Hinton, "Dynamic Routing Between Capsules," arXiv prepr. arXiv:1710.09829, 2017.
[4] 劉人傑, "Relevance Between Data-set Composition and Test Accuracy Based on Deep Learning Algorithm in Voice Detection," 國立臺北科技大學資訊工程所碩士論文, 2018.
[5] M. Defferrard, K. Benzi, P. Vandergheynst and X. Bresson, "FMA: A Dataset For Music Analysis," ISMIR 2017, 2017.
[6] G. E. Hinton, S. Osindero and Y.-W. Teh, "A fast learning algorithm for deep belief nets," Neural Computation, p. 1527–1554, 2006.
[7] T. Mikolov, S. Kombrink and L. Burget, "Extensions of recurrent neural network language model," Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE Int. Conf. on, p. 5528–5531, 2011.
[8] Y. LeCun, Y. Bengio, G. Hinton, L. Y., B. Y. and H. G., "Deep learning," Nature, vol. 521, no. 7553, p. 436–444, 2015.
[9] M. A. e. al., "TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems," arXiv Prepr. arXiv1603.04467, 2016.
[10] J. Dai, S. Liang, W. Xue, C. Ni and W. Liu, "Long Short-term Memory Recurrent Neural Network based Segment Features for Music Genre Classification," in Chinese Spoken Language Processing (ISCSLP), 2016 10th International Symposium on, 2016.
[11] B. Lehner, G. Widmer and S. Bock, "A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks," 2015 23rd Eur. Signal Process. Conf. EUSIPCO 2015, p. 21–25, 2015.
[12] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-k. Wong and W.-c. Woo, "Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting," Neural Information Processing Systems, 2015.
[13] J. Li, W. Dai, F. Metze, S. Qu and S. Das, "A Comparison of deep learning methods for environmental sound," arXiv preprint arXiv:1703.06902, 2017.

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	類神經網路在台灣股市投資之應用－指標選取與回饋式網路架構之建立
2.	類神經網路股市投資決策支援系統─總體經濟變數之再探討
3.	以回饋式類神經網路模式預測營造工程物價指數之研究
4.	即時學習回饋式類神經網路於流量推估之應用
5.	開放環境下之車牌偵測
6.	深度摺積神經網路於混合式整體學習之影像檢索技術
7.	以財務技術分析整合於回饋式類神經網路模式預測營造工程物價指數之研究
8.	應用深度學習於社群網路消費者評論之情感分析研究
9.	以專利分析探討深度學習之應用與發展趨勢
10.	以深度學習進行股價預測之研究
11.	卷積神經網路在金融技術指標之應用
12.	利用三維模型訓練類神經網路的手勢辨識技術
13.	運用機器學習方法建構房價預測視覺化平台
14.	改善類神經網路聲學模型經由結合多任務學習與整體學習於會議語音辨識之研究
15.	基於深度摺積神經網路之影像檢索技術

無相關期刊

1.	基於深度學習之人聲辨識探討資料集組成與測試準確率之關聯性
2.	基於深度學習之音樂片段人聲辨識
3.	深度學習架構於音樂推薦之因子分析與效能比較
4.	基於深度學習之群眾外包颱風災情分類系統
5.	利用深度學習探勘地物結構及異常體
6.	基於深度學習之人臉特徵辨識與應用
7.	使用額外處理步驟以提升卷積神經網路之人聲辨識準確率
8.	深度學習之訊息理論分析
9.	應用深度學習預測區域住房平均價格—以台北市實價登錄為例
10.	延伸ACE爬蟲器以支援個別選擇覆蓋之相等狀態策略
11.	基於深度學習之人員追蹤辨識系統
12.	結合深度學習車流分析的UAV交通監測平台
13.	以深度學習預測國道短期旅行時間
14.	結合深度學習於Leap Motion手勢控制移動載具之研究
15.	深度學習於醫學影像處理之應用

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室