跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.81) 您好!臺灣時間:2025/10/04 04:33
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:蔡鈺群
研究生(外文):Tsai, Yu-Chun
論文名稱:變概率貝氏推論之非負矩陣拆解應用於聲音聲源分離
論文名稱(外文):Variational Bayesian Inference Nonnegative Matrix Factorization with Application to Auditory Streaming
指導教授:冀泰石
指導教授(外文):Chi, Tai-Shih
學位類別:碩士
校院名稱:國立交通大學
系所名稱:工學院聲音與音樂創意科技碩士學位學程
學門:工程學門
學類:其他工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:102
語文別:中文
論文頁數:72
中文關鍵詞:非負矩陣拆解聲音聲源分離貝氏統計變概率分布
外文關鍵詞:Nonnegative matrix factorizationvariational bayesian approachaudio source separation
相關次數:
  • 被引用被引用:0
  • 點閱點閱:395
  • 評分評分:
  • 下載下載:12
  • 收藏至我的研究室書目清單書目收藏:0
在近幾年計算機技術的快速發展之下,聲源分離已經成為一個重要的議題。伴隨著技術的發展,非負矩陣拆解(Nonnegative Matrix Factorization, NMF)是一種多變量分析算法,主要適用於音樂聲源分離。因此本文利用非負矩陣拆解法作為聲音聲源分離的主要演算法,結合Variational Bayesian approach與超參數(Hyperparameter)的概念,進一步將NMF以貝氏統計的形式進行聲音聲源分離,使得比起以往統計形式的NMF擁有更精確的音訊分離結果。此外,本文方法亦利用近似人耳聽覺頻帶的等效矩形頻寬(ERB)進一步對FFT頻譜作分析,並以此分析結果作為觀察變量,此舉可有效降低非負矩陣拆解演算法的運算時間,亦可凸顯音樂訊號中的主要共振峰。另一方面,對於所使用的4個超參數有54種組合,而有些組合並非適用於音樂類的頻譜分離。因此,本文提出先藉由判斷下界函式是否發散為依據來挑選適用的超參數組合,再經由感知評價法(PEASS)評分工具挑選出最佳的超參數組合。
實驗分析方面,本文利用訊號分離評估競賽(SiSEC, 2013)所提供的音訊,以及感知評價法(PEASS)作為聲音聲源分離後音訊品質的評分標準。實驗先以片段的音樂做為學習樣本找出4組最佳超參數的組合設定,在以此設定針對各種音訊作人聲及各種樂器進行分離。分析結果表明,在最佳的超參數設定下,本文方法可成功分離出主要的聲源資訊,並可在少許次數的迭代就有出色的表現。

In the application of audio streaming or so called audio source separation, the goal is to decompose a music recording into sound streams from individual instruments. One of the most effective classes of methods to separate sound streams stems from the nonnegative matrix factorization (NMF). This thesis presents a variational Bayesian (VB) treatment of NMF, based on the Itakura-Saito (IS) divergence and the concepts of hyper-parameters, and derives the marginal likelihood (low bound) to approximate the posterior density of the NMF factors. An efficient iterative algorithm, which outperforms the previously derived statistics NMF methods, such as Expectation-Maximization IS-NMF, is proposed. The proposed algorithm works in the equivalent rectangular bandwidth (ERB) domain, where the main resonance of the music signal is emphasized. In addition, the hyper-parameters are optimized in the case of inverse-Gamma prior. Simulations show the matrix factorization indeed improves separation results over the EM-IS-NMF using perceptual evaluation methods for audio source separation (PEASS) scoring tool. A comparative study between the VB-IS-NMF and the EM-IS-NMF algorithms when applying to ERB spectrogram of a short vocal and bass sequence recorded in real conditions is demonstrated. Simulations show the proposed VB-IS-NMF can be successfully used for streaming music clips from the signal separation evaluation campaign (SiSEC 2013). Finally, the proposed algorithm outperforms other methods which do not require explicit training data as well for the separation of audio signals provided by SiSEC.
摘要 i
Abstract ii
誌謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
一、簡介 1
1.1 研究動機 1
1.2 研究目的及方法簡介 1
1.3 NMF簡介 2
1.4 文獻探討 3
1.5論文組織 9
二、Variational Bayesian NMF 11
2.1 NMF的統計假設 11
2.2估算後驗分布(posterior distribution) 13
2.3 下界函式 19
2.4 超參數最佳化(Hyperparameter Optimization) 21
2.5 演算法(Computational algorithm) 24
三、音訊分離的實驗分析 26
3.1 音樂資料 26
3.2 音訊分離實驗參數 26
3.2.1 頻譜變換參數 26
3.2.2 NMF初始參數K 29
3.2.3 超參數的初始設定與迭代式選擇 31
3.2.4 演算法收斂條件 31
3.3 音訊分離評比標準 32
3.4 音訊分離實驗流程 35
3.5 音訊分離結果與討論 36
3.5.1 音樂資料第一類 36
3.5.2 音樂資料第二類 39
3.5.3 音訊分離結果與SiSEC比賽結果比較 58
四、總結 66
4.1 結論 66
4.2 未來展望 66
參考文獻 68

[1] H. Helmholtz, On the Sensation of Tone, second English edition, Dover Publishers, New York, 1863.
[2] A. S. Bregman, Auditory Scene Analysis, MIT Press, MA, 1990.
[3] A. S. Bregman. "Constraints on computational models of auditory scene analysis, as derived from human perception", The Journal of the Acoustical Society of Japan (E), vol. 16, no.3, pp. 133-136, May 1995.
[4] C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
[5] D. D. Lee, H. S. Seung. "Learning the parts of objects with nonnegative matrix factorization", Nature, vol. 401, pp. 788-791, 1999.
[6] M.R. Every, J.E. Szymanski. "Separation of synchronous pitched notes by spectral filtering of harmonics", Audio, Speech, and Language Processing, IEEE Transactions on, vol.14, no.5, pp. 1845-1856, Sept. 2006.
[7] T. Virtanen. "Separation of Sound Sources by Convolutive Sparse Coding", ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, 2004.
[8] T. Virtanen. "Sound Source Separation in Monaural Music Signals", Tampere University of Technology, PhD thesis, 2006.
[9] D.L. Wang, et al., Computational auditory scene analysis: Principles, algorithms, and applications, Wiley interscience, 2006.
[10] Y. Li, J. Woodruff, D.L. Wang. "Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation", Audio, Speech, and Language Processing, IEEE Transactions on, vol.17, no.7, pp. 1361-1371, Sept. 2009.
[11] J. Woodruff, B. Pardo. "Using Pitch, Amplitude Modulation and Spatial Cues for Separation of Harmonic Instruments from Stereo Music Recordings", EURASIP Journal on Advances in Signal Processing, 2007.
[12] D. D. Lee, H. S. Seung. "Algorithms for non-negative matrix factorization", Advances in neural information processing systems, pp. 556-562, 2001.
[13] A. Cichocki, R. Zdunek, S.I. Amari. "Csiszar’s divergences for non-negative matrix factorization: family of new algorithms", International Conference on Independent Component Analysis and Blind Source Separation, pp. 32-39, Charleston, USA, March 2006.
[14] C. Févotte, N. Bertin, J. L. Durrieu. "Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis", Neural computation, vol. 21, no.3, pp. 793-830, 2009.
[15] R. M. Gray, A. Buzo, A. Gray Jr, Y. Matsuyama. "Distortion measures for speech processing", Audio, Speech, and Language Processing, IEEE Transactions on, vol. 28, no. 4, pp. 367–376, 1980.
[16] P. Smaragdis, J.C. Brown. "Non-negative Matrix Factorization for Polyphonic Music Transcription", Applications of Signal Processing to Audio and Acoustics, IEEE Workshop on, pp. 177-180, 2003.
[17] P. Smaragdis. "Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs", International Conference on Independent Component Analysis and Blind Source Separation, pp. 494-499, Granada, Spain, Sept. 2004.
[18] M. N. Schmidt, M. Morup. "Nonnegative matrix factor 2-D deconvolution for blind single channel source separation", International Conference on Independent Component Analysis and Blind Source Separation, pp. 700-707, Charleston, USA, March 2006.
[19] T. Virtanen. "Monaural Sound Source Separation by Non-Negative Matrix Factorization with Temporal Continuity and Sparseness Criteria", Audio, Speech, and Language Processing, IEEE Transactions on, vol. 15, no. 3, pp. 1066-1073, 2007.
[20] A. Ozerov, C. Févotte, "Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation", Audio, Speech, and Language Processing, IEEE Transactions on, vol. 18, no. 3, pp. 550-563, Mar. 2010.
[21] H. Sawada, H. Kameoka, S. Araki, N. Udea. "Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data", Audio, Speech, and Language Processing, IEEE Transactions on, vol.21, no.5, pp. 971-982, May 2013.
[22] C. Févotte, A. T. Cemgil. "Nonnegative matrix factorisations as probabilistic inference in composite models", In Proc., European Signal Processing Conference (EUSIPCO), Glasgow, 2009.
[23] M. N. Schmidt, H. Laurberg. "Non-negative matrix factorization with Gaussian process priors", Computational Intelligence and Neuroscience, 2008.
[24] C. J. Lin. "Projected gradient methods for non-negative matrix factorization", Neural Computation, vol. 19, pp. 2756-2779, 2007.
[25] E. Vincent. "Musical source separation using time-frequency source priors", Audio, Speech, and Language Processing, IEEE Transactions on, vol. 14, no. 1, pp.91-98, 2006.
[26] P. Hoyer. "Non-negative sparse coding", Neural Networks for Signal Processing, Proceedings of the 2002 12th IEEE Workshop on, pp. 557-565, 2002.
[27] S. Moussaoui, D. Brie, A. Mohammad-Djafari, C. Carteret. "Separation of Non-Negative Mixture of Non-Negative Sources Using a Bayesian Approach and MCMC Sampling", Signal Processing, IEEE Transactions on, vol. 54, no. 11, pp. 4133-4145, 2006.
[28] T. Virtanen, A. T. Cemgil, S. Godsill. "Bayesian extensions to nonnegative matrix factorisation for audio signal modelling", Acoustics, Speech and Signal Processing, ICASSP 2008, IEEE International Conference on, pp. 1825-1828, 2008.
[29] M. N. Schmidt, O. Winther, L. K. Hansen. "Bayesian non-negative matrix factorization", International Conference on Independent Component Analysis and Blind Source Separation, pp. 540-547, Paraty, Brazil, March 2009.
[30] A. T. Cemgil. "Bayesian inference for nonnegative matrix factorization models", Computational Intelligence and Neuroscience, 2009.
[31] J. Rustagi, Variational Methods in Statistics, Academic Press, New York, 1976.
[32] S. Reinhard. "Extensions of non-negative matrix factorization and their application to the analysis of wafer test data", University Regensburg, PhD thesis, 2010.
[33] B. M. James. "Variational algorithms for approximate Bayesian inference", University of London, PhD thesis, 2003.
[34] J. L. W. V. Jensen. "Sur les fonctions convexes et les inégalités entre les valeurs moyennes", Acta Mathematica, vol. 30, no. 1, pp. 175-193, 1906.
[35] Z. Ghahramani, M. J. Beal. "Variational Inference for Bayesian Mixtures of Factor Analysers", Advances in neural information processing systems, pp. 449-455, 1999.
[36] E. T. Jaynes. "Information Theory and Statistical Mechanics", Physical Review, vol. 106, no. 4, pp. 620-620, 1957.
[37] E. T. Jaynes, Probability Theory: The Logic of Science, Cambridge University Press, 2003.
[38] NiGiD, Call for Remixes: Close to Mike by CiggiBurns [Online]. Available: http://ccmixter.org/files/NiGiD/43470
[39] in Signal Separation Evaluation Campaign (SiSEC 2013), 2013 [Online]. Available: http://www.sisec.wiki.irisa.fr
[40] V. Emiya, E. Vincent, N. Harlander, V. Hohmann. "Subjective and Objective Quality Assessment of Audio Source Separation", Audio, Speech, and Language Processing, IEEE Transactions on, vol. 19, no. 7, pp.2046-2057, 2011.
[41] E. Vincent. "Improved Perceptual Metrics for the Evaluation of Audio Source Separation", International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2012), pp. 403-437, Tel Aviv, Israel, March, 2012.
[42] E. Vincent, H. Sawada, P. Bofill, S. Makino, J.P. Rosca. "First Stereo Audio Source Separation Evaluation Campaign: Data, Algorithms and Results", International Conference on Independent Component Analysis and Signal Separation, pp. 552-559, London, UK, September 2007.
[43] R. Huber, B.Kollmeier, "PEMO-Q – A New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception", Audio, Speech, and Language Processing, IEEE Transactions on, vol. 14, no. 6, pp.1902-1911, 2006.
[44] S. Gorlow, S. Marchand, "Informed Audio Source Separation Using Linearly Constrained Spatial Filters", Audio, Speech, and Language Processing, IEEE Transactions on, vol. 21, no. 1, pp. 3-11, 2013.
[45] S. Gorlow, S. Marchand, "Informed Separation of Spatial Images of Stereo Music Recordings Using Second-Order Statistics", IEEE International Workshop on Machine Learning for Signal Processing, Southampton, UK, 2013.
[46] A. Ozerov, N.Q.K. Duong, L. Chevallier, "Weighted Nonnegative Tensor Factorization with Application to User-Guided Audio Source Separation", Submitted to IEEE International Workshop on Machine Learning for Signal Processing, Southampton, UK, 2013.
[47] S. Arberet, R. Gribonval, F. Bimbot, "A Robust Method to Count and Locate Audio Sources in a Multichannel Underdetermined Mixture", Signal Processing, IEEE Transactions on, vol. 58, no. 1, pp. 121-133, 2010.
[48] J. L. Durrieu, G. Richard, B. David, "A Musically Motivated Representation for Pitch Estimation and Musical Source Separation", Selected Topics in Signal Processing, IEEE Journal of, vol. 5, no. 6, pp. 1180-1191, 2011.
[49] A. Ozerov, E. Vincent, F. Bimbot, "A General Flexible Framework for the Handling of Prior Information in Audio Source Separation", Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 4, pp. 1118-1133, 2012.
[50] N.J. Bryan, G. J. Mysore, "An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation", International Conference on Machine Learning, Atlanta, Georgia, USA, 2013.
[51] N.J. Bryan, G. J. Mysore, "Interactive Refinement of Supervised and Semi-Supervised Sound Source Separation Estimates", Acoustics, Speech and Signal Processing, ICASSP 2013, IEEE International Conference on, pp. 883-887, 2013.
[52] A. Liutkus, Z. Rafii, R. Badeau, B. Pardo, G. Richard, "Adaptive Filtering for Music/Voice Separation Exploiting the Repeating Musical Structure", Acoustics, Speech and Signal Processing, ICASSP 2012, IEEE International Conference on, pp. 53-56, 2012

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文