(44.192.10.166) 您好!臺灣時間:2021/03/06 03:08
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳柏瑞
研究生(外文):Chen Bo Rui
論文名稱:頻域上雙聲道聲源分離方法:運算簡化以及音質改進之做法
論文名稱(外文):Source Separation in Frequency Domain: Computation Cost Reduction and Sound Quality Enhancement
指導教授:劉奕汶
指導教授(外文):Liu, Yi Wen
口試委員:白明憲李夢麟李祈均
口試委員(外文):Bai, Ming SianLee, Meng LingLee, Chi Chun
口試日期:2017-01-13
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:中文
論文頁數:52
中文關鍵詞:聲源分離獨立成分分析排列問題膨脹問題
外文關鍵詞:source separationICAscaling problempermutation problem
相關次數:
  • 被引用被引用:0
  • 點閱點閱:73
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:5
  • 收藏至我的研究室書目清單書目收藏:0
本論文於頻域上以獨立成分分析法做聲源分離,在現實環境中聲音以摺積混合抵達麥克風,透過短時間傅立葉轉換,可以有效簡化時域上獨立成分分析計算,但有兩個不確定因素影響訊號重組,分別為膨脹問題與排列問題。本論文提出一套演算法針對膨脹問題與排列問題做處理,並利用到達時間差定位聲源來向來判斷是否為雙聲源,避免單一聲源做分離時影響分離效果與增加運算時間,對於膨脹問題,使用高斯混合模型近似各個頻率柱上的分離訊號與混合訊號的分布,找出最大權重參數對應之平均數差來處理,減少高頻部分因膨脹問題所造成之雜訊;對於排列問題部分,利用同一聲源的各個頻帶在時間變化的呈現會有較大的相關性,將相關性較大的分離訊號歸類為同一聲源,本論文首先找出一組相關性較大之多個頻率柱做為基準,欲排列之分離訊號再與基準頻率柱比較,並找出最佳排列順序,簡化繁複的排列步驟與運算複雜度。與文獻[30]相比,本論文提出方法之運算時間減少17秒,訊號干擾比SIR值增加4dB,並於現場問卷調查評估音質與分離效果部分,分數亦增加1.4分。評估分離效果部分,以混合音檔與分離音檔各三組給受測者聆聽並寫下字句來判斷正確率,三組分離音檔之正確字數相較於混合音檔各提升41%、26%、45%之正確率,由結果可知,本論文提出之演算法與文獻[30]相比可以有效簡化排列問題之複雜度與減少雜音,達到提升音質與增進運算速度之目標。
In a real environment, sound sources are mixed through convolution mixture, and it is difficult to separate sources in the time domain. Therefore, we use independent component analysis (ICA) in the frequency domain. Using ICA in the frequency domain could reduce the computation, but there are two important ambiguities: scaling problem and permutation problem. These ambiguities affect reconstruction of separated source. In this thesis, a new approach is proposed for solving the scaling problem and permutation problem. Besides, Time difference of arrival (TDOA) is used to confirm that two sources exist simultaneously. To solve the scaling problem, the Gaussian mixture model is uesd to approximate the distribution of the separated signal and the mixed signal. The difference between the mean of separated signal and the mean of the mixed signal is compensated to solve the scaling problem. Considering the permutation problem, the present algorithm relies on the assumption that the correlations should be high between the temporal envelopes of neighboring frequencies from the same sound source. First, we find the five neighboring frequency bins which have a high correlation with each other as a standard. After that, separated source in other frequency bins could confirm permutation through the correlation with the standard. We compare with the result of the approach of [30]. Computation time is reduced by 17 seconds and SIR enhances by 4 dB. In the part of the questionnaire, we get a higher score than [30]. 66 subjects were recruited to conduct a listening comprehension test. The accuracy of listening comprehension of separated sources is 41%, 26%, 45% higher than unprocessed sounds. The results show that our approach reduces computation cost and enhances sound quality when compared to the existing method [30].
中文摘要 i
Abstract ii
目錄 I
圖目錄 III
表目錄 V
第一章 緒論 1
1.1研究動機 1
1.2研究方法 1
1.3研究目標 3
1.4章節介紹 4
第二章 獨立成份分析 5
2.1 背景與基本理論 6
2.2 假設 6
2.3 前置處理 7
2.4 獨立性函數 9
2.5 最佳化演算法 11
2.6 不確定性因素 13
第三章 聲源分離 16
3.1 聲源分離系統 16
3.2 頻域聲源分離流程 17
3.3 雙聲源判斷 18
第四章 膨脹問題、排列問題及環形摺積問題之解決方法 22
4.1 膨脹問題 22
4.2 排列問題 25
第五章 實驗方法及結果 31
5.1 實驗器材 31
5.2 評估方式 35
5.3 實驗結果 37
第六章 結論與未來展望 40
6.1 結論 40
6.2 未來展望 41
參考文獻 42
附錄 47
A.1 受測者名單 47
A.2 複數獨立成分分析收斂式推導 48
A.3 口試委員評論與建議 50
A.4 Google 語音辨識系統評估 51
[1] M. G. Lopez P., H. Molina Lozano, L. P. Sanchez F., and L. N. Oliva Moreno, “Blind Source Separation of audio signals using independent component analysis and wavelets,” in CONIELECOMP 2011, 21st International Conference on Electrical Communications and Computers, 2011, pp. 152–157.
[2] J. Nikunen and T. Virtanen, “Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation,” IEEE/ACM Trans. Audio, Speech, and Language Processing, vol. 22, no. 3, pp. 727–739, Mar. 2014.
[3] Y. Yang, Z. Li, X. Wang, and D. Zhang, “Noise source separation based on the blind source separation,” in 2011 Chinese Control and Decision Conference (CCDC), 2011, pp. 2236–2240.
[4] Yun-Hsuan Hsiao, “Multiple Source Tracking and Separation Using MUSIC Algorithm,” Sound and Music Innovative Technologies College of Engineering National Chiao Tung University in 2011.
[5] Li-Wen Ho, “Using Generalized Gaussian Mixture Model to Detect Sound Locations of Unknown Number of Sources for Sound Segregation,” Communication Engineering College of Electrical and Computer Engineering National Chiao-Tung University in 2012.
[6] A. Hyvärinen and E. Oja, “Independent component analysis: algorithms and applications,” Neural networks, 13.4(2000): 411-430.
[7] S. Wold, K. Esbensen, and P. Geladi, “Principle component analysis,” Chemometrics and intelligent laboratory systems, 1987.
[8] J.-F. Cardoso, , “Blind signal separation: statistical principles,” Proc. IEEE, vol. 86, no. 10, pp. 2009–2025, 1998.
[9] M. Zibulevsky and B. A. Pearlmutter, “Blind Source Separation by Sparse Decomposition in a Signal Dictionary,” Neural Computation, vol. 13, no. 4, pp. 863–882, Apr. 2001.
[10] Zhitang Chen, Laiwan Chan, “New approaches for solving permutation indeterminacy and scaling ambiguity in frequency domain separation of convolved mixtures,” Proceedings of International Joint Conference on Neural Networks, San Jose, California, USA, July 31 – pp. 911-918, August 5.2011
[11] Yi-Ru Lian, “An investigation of frequency domain ICA for speech signal separation,” Department of Electrical and Control Engineering College of Electrical Engineering and Computer Science National Chiao Tung University in 2004.
[12] C. Jutten and J. Herault, “Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture,” Signal processing 24.1 (1991): 1-10.
[13] P. Comon, “Independent component analysis, a new concept?,” Signal processing 36.3 (1994): 287-314.
[14] A. J. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129–1159, Nov. 1995.
[15] A. Hyvärinen, “Fast and robust fixed-point algorithms for independent component analysis. ,” IEEE Trans. Neural Networks, vol. 10, no. 3, pp. 626–34, Jan. 1999.
[16] Masour, A.;Jutten, C., “What should we say about the kurtosis?,” Signal Processing Letters, IEEE, Volume:6, Issue:12, Dec.1999, P321-322.
[17] Huber, P. “Projection pursuit,” The Annals of Statistics in 1985, 13(2):435–475
[18] T. Cover and J. Thomas, Elements of information theory. John Wiley & Sons, Inc., 2012.
[19] M. Jones and R. Sibson, “What is projection pursuit? ,” Journal of the Royal Statistical Society. Series A (General) (1987): 1-37.
[20] A. Hyvärinen, “New approximations of differential entropy for independent component analysis and projection pursuit,” Advances in Neural Information Processing Systems 10 (1998): 273-279.
[21] H. Shen and K. Huper, “Newton-Like methods for parallel independent component analysis,” in 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, 2006, pp. 283–288.
[22] S. Choi, S. Amari, A. Cichocki, and R. Liu, “Natural gradient learning with a nonholonomic constraint for blind deconvolution of multiple channels,” First International Workshop on Independent Component Analysis and Signal Separation. 1999.
[23] D. Luenberger, Optimization by vector space methods, John Wiley & Sons, Inc., 1969.
[24] K. Matsuoka, “Minimal distortion principle for blind source separation,” in Proceedings of the 41st SICE Annual Conference. SICE 2002., 2002, vol. 4, pp. 2138–2143.
[25] M.S. Lewichi and T.J. Sejnowski, “Learning Overcomplete Representation,”
Neural Computation, vol. 12, no. 2, pp. 337-365, 2000.
[26] Dmitry M. Malioutov, Müjdat Çetin, and Alan S. Willsky, “Homotopy continuation for sparse signal representation,” Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.
[27] M. Z. Ikram and D. R. Morgan, “A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation,” in IEEE International Conference on Acoustics Speech and Signal Processing, 2002, vol. 1, pp. I–881–I–884.
[28] F. Nesta, T. S. Wada, and B.-H. Juang, “Coherent spectral estimation for a robust solution of the permutation problem,” in 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009, pp. 105–108.
[29] D. Nion, K. N. Mokios, N. D. Sidiropoulos, and A. Potamianos, “Batch and adaptive PARAFAC-Based blind separation of convolutive speech mixtures,” IEEE Trans. Audio, Speech, and Language Processing, vol. 18, no. 6, pp. 1193–1207, Aug. 2010.
[30] Huang-Yi Li, “Solving the permutation problem in frequency domain source separation based on the correlation of envelopes between frequencies,”清大碩士論文,2015.
[31] L. Parra and C. Spence, “Convolutive blind separation of non-stationary sources,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 3, pp. 320–327, May 2000.
[32] V. G. Reju, “Underdetermined convolutive blind source separation via time–frequency masking,” IEEE Trans. Audio, Speech, and Language Processing, vol. 18, no. 1, pp. 101–116, Jan. 2010
[33] M. Joho, H. Mathis, and R. Lambert, “Overdetermined blind source separation: Using more sensors than source signals in a noisy mixture,” Proc. ICA. 2000.
[34] E. Bingham and A. Hyvärinen, “A fast fixed-point algorithm for independent component analysis of complex valued signals,” International journal of neural systems 10.01 (2000): 1-8.
[35] Chao-Wen Li, “A probabilistic model for sound direction of arrival estimation based on signal-to-noise ratios in the frequency domain,” 清大碩士論文, 2015.
[36] E. Vincent, R. Gribonval, and C. Fevotte, “Performance measurement in blind audio source separation,” IEEE Trans. Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1462–1469, Jul. 2006.
[37] R. Mazur, J. O. Jungmann, and A. Mertins, “A new clustering approach for solving the permutation problem in convolutive blind source separation,” in 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013, pp. 1–4.
[38] http://www.kecl.ntt.co.jp/icl/signal/sawada/demo/bss2to4/index.html.
[39] https://www.google.com/intl/en/chrome/demos/speech.html
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔