( 您好!臺灣時間:2021/03/06 03:08
字體大小: 字級放大   字級縮小   預設字形  


研究生(外文):Chen Bo Rui
論文名稱(外文):Source Separation in Frequency Domain: Computation Cost Reduction and Sound Quality Enhancement
指導教授(外文):Liu, Yi Wen
口試委員(外文):Bai, Ming SianLee, Meng LingLee, Chi Chun
外文關鍵詞:source separationICAscaling problempermutation problem
  • 被引用被引用:0
  • 點閱點閱:73
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:5
  • 收藏至我的研究室書目清單書目收藏:0
In a real environment, sound sources are mixed through convolution mixture, and it is difficult to separate sources in the time domain. Therefore, we use independent component analysis (ICA) in the frequency domain. Using ICA in the frequency domain could reduce the computation, but there are two important ambiguities: scaling problem and permutation problem. These ambiguities affect reconstruction of separated source. In this thesis, a new approach is proposed for solving the scaling problem and permutation problem. Besides, Time difference of arrival (TDOA) is used to confirm that two sources exist simultaneously. To solve the scaling problem, the Gaussian mixture model is uesd to approximate the distribution of the separated signal and the mixed signal. The difference between the mean of separated signal and the mean of the mixed signal is compensated to solve the scaling problem. Considering the permutation problem, the present algorithm relies on the assumption that the correlations should be high between the temporal envelopes of neighboring frequencies from the same sound source. First, we find the five neighboring frequency bins which have a high correlation with each other as a standard. After that, separated source in other frequency bins could confirm permutation through the correlation with the standard. We compare with the result of the approach of [30]. Computation time is reduced by 17 seconds and SIR enhances by 4 dB. In the part of the questionnaire, we get a higher score than [30]. 66 subjects were recruited to conduct a listening comprehension test. The accuracy of listening comprehension of separated sources is 41%, 26%, 45% higher than unprocessed sounds. The results show that our approach reduces computation cost and enhances sound quality when compared to the existing method [30].
中文摘要 i
Abstract ii
目錄 I
圖目錄 III
表目錄 V
第一章 緒論 1
1.1研究動機 1
1.2研究方法 1
1.3研究目標 3
1.4章節介紹 4
第二章 獨立成份分析 5
2.1 背景與基本理論 6
2.2 假設 6
2.3 前置處理 7
2.4 獨立性函數 9
2.5 最佳化演算法 11
2.6 不確定性因素 13
第三章 聲源分離 16
3.1 聲源分離系統 16
3.2 頻域聲源分離流程 17
3.3 雙聲源判斷 18
第四章 膨脹問題、排列問題及環形摺積問題之解決方法 22
4.1 膨脹問題 22
4.2 排列問題 25
第五章 實驗方法及結果 31
5.1 實驗器材 31
5.2 評估方式 35
5.3 實驗結果 37
第六章 結論與未來展望 40
6.1 結論 40
6.2 未來展望 41
參考文獻 42
附錄 47
A.1 受測者名單 47
A.2 複數獨立成分分析收斂式推導 48
A.3 口試委員評論與建議 50
A.4 Google 語音辨識系統評估 51
[1] M. G. Lopez P., H. Molina Lozano, L. P. Sanchez F., and L. N. Oliva Moreno, “Blind Source Separation of audio signals using independent component analysis and wavelets,” in CONIELECOMP 2011, 21st International Conference on Electrical Communications and Computers, 2011, pp. 152–157.
[2] J. Nikunen and T. Virtanen, “Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation,” IEEE/ACM Trans. Audio, Speech, and Language Processing, vol. 22, no. 3, pp. 727–739, Mar. 2014.
[3] Y. Yang, Z. Li, X. Wang, and D. Zhang, “Noise source separation based on the blind source separation,” in 2011 Chinese Control and Decision Conference (CCDC), 2011, pp. 2236–2240.
[4] Yun-Hsuan Hsiao, “Multiple Source Tracking and Separation Using MUSIC Algorithm,” Sound and Music Innovative Technologies College of Engineering National Chiao Tung University in 2011.
[5] Li-Wen Ho, “Using Generalized Gaussian Mixture Model to Detect Sound Locations of Unknown Number of Sources for Sound Segregation,” Communication Engineering College of Electrical and Computer Engineering National Chiao-Tung University in 2012.
[6] A. Hyvärinen and E. Oja, “Independent component analysis: algorithms and applications,” Neural networks, 13.4(2000): 411-430.
[7] S. Wold, K. Esbensen, and P. Geladi, “Principle component analysis,” Chemometrics and intelligent laboratory systems, 1987.
[8] J.-F. Cardoso, , “Blind signal separation: statistical principles,” Proc. IEEE, vol. 86, no. 10, pp. 2009–2025, 1998.
[9] M. Zibulevsky and B. A. Pearlmutter, “Blind Source Separation by Sparse Decomposition in a Signal Dictionary,” Neural Computation, vol. 13, no. 4, pp. 863–882, Apr. 2001.
[10] Zhitang Chen, Laiwan Chan, “New approaches for solving permutation indeterminacy and scaling ambiguity in frequency domain separation of convolved mixtures,” Proceedings of International Joint Conference on Neural Networks, San Jose, California, USA, July 31 – pp. 911-918, August 5.2011
[11] Yi-Ru Lian, “An investigation of frequency domain ICA for speech signal separation,” Department of Electrical and Control Engineering College of Electrical Engineering and Computer Science National Chiao Tung University in 2004.
[12] C. Jutten and J. Herault, “Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture,” Signal processing 24.1 (1991): 1-10.
[13] P. Comon, “Independent component analysis, a new concept?,” Signal processing 36.3 (1994): 287-314.
[14] A. J. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129–1159, Nov. 1995.
[15] A. Hyvärinen, “Fast and robust fixed-point algorithms for independent component analysis. ,” IEEE Trans. Neural Networks, vol. 10, no. 3, pp. 626–34, Jan. 1999.
[16] Masour, A.;Jutten, C., “What should we say about the kurtosis?,” Signal Processing Letters, IEEE, Volume:6, Issue:12, Dec.1999, P321-322.
[17] Huber, P. “Projection pursuit,” The Annals of Statistics in 1985, 13(2):435–475
[18] T. Cover and J. Thomas, Elements of information theory. John Wiley & Sons, Inc., 2012.
[19] M. Jones and R. Sibson, “What is projection pursuit? ,” Journal of the Royal Statistical Society. Series A (General) (1987): 1-37.
[20] A. Hyvärinen, “New approximations of differential entropy for independent component analysis and projection pursuit,” Advances in Neural Information Processing Systems 10 (1998): 273-279.
[21] H. Shen and K. Huper, “Newton-Like methods for parallel independent component analysis,” in 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, 2006, pp. 283–288.
[22] S. Choi, S. Amari, A. Cichocki, and R. Liu, “Natural gradient learning with a nonholonomic constraint for blind deconvolution of multiple channels,” First International Workshop on Independent Component Analysis and Signal Separation. 1999.
[23] D. Luenberger, Optimization by vector space methods, John Wiley & Sons, Inc., 1969.
[24] K. Matsuoka, “Minimal distortion principle for blind source separation,” in Proceedings of the 41st SICE Annual Conference. SICE 2002., 2002, vol. 4, pp. 2138–2143.
[25] M.S. Lewichi and T.J. Sejnowski, “Learning Overcomplete Representation,”
Neural Computation, vol. 12, no. 2, pp. 337-365, 2000.
[26] Dmitry M. Malioutov, Müjdat Çetin, and Alan S. Willsky, “Homotopy continuation for sparse signal representation,” Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.
[27] M. Z. Ikram and D. R. Morgan, “A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation,” in IEEE International Conference on Acoustics Speech and Signal Processing, 2002, vol. 1, pp. I–881–I–884.
[28] F. Nesta, T. S. Wada, and B.-H. Juang, “Coherent spectral estimation for a robust solution of the permutation problem,” in 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009, pp. 105–108.
[29] D. Nion, K. N. Mokios, N. D. Sidiropoulos, and A. Potamianos, “Batch and adaptive PARAFAC-Based blind separation of convolutive speech mixtures,” IEEE Trans. Audio, Speech, and Language Processing, vol. 18, no. 6, pp. 1193–1207, Aug. 2010.
[30] Huang-Yi Li, “Solving the permutation problem in frequency domain source separation based on the correlation of envelopes between frequencies,”清大碩士論文,2015.
[31] L. Parra and C. Spence, “Convolutive blind separation of non-stationary sources,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 3, pp. 320–327, May 2000.
[32] V. G. Reju, “Underdetermined convolutive blind source separation via time–frequency masking,” IEEE Trans. Audio, Speech, and Language Processing, vol. 18, no. 1, pp. 101–116, Jan. 2010
[33] M. Joho, H. Mathis, and R. Lambert, “Overdetermined blind source separation: Using more sensors than source signals in a noisy mixture,” Proc. ICA. 2000.
[34] E. Bingham and A. Hyvärinen, “A fast fixed-point algorithm for independent component analysis of complex valued signals,” International journal of neural systems 10.01 (2000): 1-8.
[35] Chao-Wen Li, “A probabilistic model for sound direction of arrival estimation based on signal-to-noise ratios in the frequency domain,” 清大碩士論文, 2015.
[36] E. Vincent, R. Gribonval, and C. Fevotte, “Performance measurement in blind audio source separation,” IEEE Trans. Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1462–1469, Jul. 2006.
[37] R. Mazur, J. O. Jungmann, and A. Mertins, “A new clustering approach for solving the permutation problem in convolutive blind source separation,” in 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013, pp. 1–4.
[38] http://www.kecl.ntt.co.jp/icl/signal/sawada/demo/bss2to4/index.html.
[39] https://www.google.com/intl/en/chrome/demos/speech.html
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔