跳到主要內容

臺灣博碩士論文加值系統

(3.231.230.177) 您好!臺灣時間:2021/07/27 11:10
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:鄭祺勳
研究生(外文):ZHENG, QI-XUN
論文名稱:即時多聲源定位與分離之實現
論文名稱(外文):Implementation of Real-Time Multiple Sound Source Localization and Separation
指導教授:許正欣許正欣引用關係
指導教授(外文):SHEU, JENG-SHIN
口試委員:許正欣周修平林建州鄭佳炘連振凱
口試委員(外文):SHEU, JENG-SHINCHOU, VINCENTLIN, CHIEN-CHOUCHENG, CHIA-HSINLAIN, JENN-KAIE
口試日期:2020-07-27
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:90
中文關鍵詞:即時處理聲源定位聲源分離頻率區塊
外文關鍵詞:Real-timeSound source localizationSound source separationFrequency Zone
相關次數:
  • 被引用被引用:0
  • 點閱點閱:94
  • 評分評分:
  • 下載下載:5
  • 收藏至我的研究室書目清單書目收藏:0
本論文發展一種使用麥克風陣列,在未知聲源數目、聲源方向和背景環境的情況下,進行即時之語音定位與語音分離的技術。我們使用環形整合交叉譜 (Circular Integrated Cross Spectrum, CICS) 來對一多聲源訊號估計其中之聲源方向 (Direction of Arrivals, DOAs) 的統計分佈,進而獲得其聲源數目與聲源方向資訊。CICS的估計是根據每對相鄰麥克風對 (Microphone Pair) 的交叉功率譜 (Cross-Power Spectrum) 所計算出有最大相關性之頻率的相位與相位旋轉因子 (Phase Rotation Factors) 而得到。有了聲源數目與聲源方向的資訊,我們從每個頻率區塊 (Frequency Zone) 所計算出的聲源DOAs以進行語音分離。
與現有其他方法比較分離效能:如獨立成分分析 (Independent Component Analysis, ICA)、盲信號分離法 (Blind Source Separation, BSS)、深度集群 (Deep-mask)。實驗結果顯示,雖然ICA可以在不知聲源數目與聲源方向資訊下進行簡易的聲源分離,但分離效果極不穩定。給定聲源數目與聲音源方向資訊,BSS可改善ICA的分離效果,但其所處理的多聲源訊號必需滿足一些預設的獨立假設。經由CICS獲得聲源數目與聲源方向資訊,我們所提的方法能夠不需任何其他假設條件下,而達到穩定的聲源分離效果。深度學習的Deep-mask技術雖可達到穩定的聲源分離效果,然而它受限於訓練資料的假設,例如:固定的聲源數目、相同的背景環境噪音等。一但測試的多聲源訊號資料不符合其假設,分離效果不佳。但我們的方法不需這些先前的假設,也不必經過任何的訓練階段。關於運算時間的比較,若處理的音訊長度為6秒時,本論文技術需1.8秒,ICA為0.4秒,BSS為1.8秒,Deep-mask則需4.4秒。

This thesis developed a technology that uses a microphone array to perform real-time voice localization and voice separation without the knowledge of the number of sound sources, sound source directions and background environment. We use Circular Integrated Cross Spectrum (CICS) to estimate the statistical distribution of the Direction of Arrivals (DOAs) for a multi-source signal, and then devise voice localization algorithm for the number of sound sources and their DoAs. Based on the cross-power spectrum of each frequency zone (FZ) from each adjacent microphone pair, we calculate the CICS over DoAs for each FZ by determining both the phase of the frequency point that gives the greatest correlation and the corresponding Phase Rotation Factor. We take advantage of the information of both the number of sound sources and their DOAs, we further obtain the DoAs of the sound source in each FZ for our speech separation algorithm.
Compare the separation performance with other existing methods: such as Independent Component Analysis (ICA), Blind Source Separation (BSS), and Deep-mask. Experimental results show that although ICA can perform simple sound source separation without the knowledge of the number of sound sources and their DOAs, the performance is extremely unstable. On the other hand, given the number of sound sources and their DOAs, BSS can improve the ICA, but the multi-sound source signals it processes must satisfy some independent assumptions. With the knowledge of the number of sound sources and their DOAs through the CICS information, the proposed method can achieve stable sound source separation without any other assumptions. Although the Deep-mask method of deep learning can achieve a stable sound source separation, it is limited by the assumptions in the training data it used, such as: a fixed number of sound sources, the same background noise, etc. Once the test multi-source signal data does not meet these assumptions, the separation is not good. But our method does not need these assumptions, nor does it need to go through any training stage. Regarding the comparison of processing time, with the audio length of 6 seconds, our method takes 1.8 seconds, ICA is 0.4 seconds, BSS is 1.8 seconds, and Deep-mask takes 4.4 seconds.

摘要 i
ABSTRACT ii
誌謝 iv
目錄 v
表目錄 vii
圖目錄 viii
一、緒論 1
1.1 簡介 1
1.2 系統介紹 3
1.3 論文架構 4
二、時頻域分析 5
2.1 時頻域 5
2.2 單一源區域假設(Single-Source Zone Assumption) 8
三、聲音到達方向估算 10
3.1 麥克風陣列 10
3.2 單路徑音訊模型 11
3.3 聲源角度DOA估計 13
3.3.1 交叉功率譜相位(Phase of the Cross-Power Spectrum) 13
3.3.2 相位旋轉因子(Phase Rotation Factors) 13
3.3.3 環形整合交叉譜(Circular Integrated Cross Spectrum, CICS) 14
3.4 繪製聲源角度統計圖 15
3.5 MP 演算法 16
3.5.1 Blackman Windows 17
3.5.2 利用Blacjman window對統計圖掃描 17
3.5.3 峰值貢獻度 19
3.5.4 消除峰值影響力 20
3.5.5 貢獻度閥值 22
四、聲源分離 24
4.1 聲源分離 24
4.2 分離方法 25
4.3 權重函數的設計 28
4.4 分離演算法效能評比 29
五、實驗結果 30
5.1 實驗設計 30
5.2 模擬實驗 31
5.3 二個聲源 34
5.3.1 第一組 34
5.3.2 第二組 38
5.3.3 第三組 42
5.3.4 第四組 46
5.3.5 2個聲源定位與分離結果 50
5.4 四聲源 51
5.4.1 第一組 51
5.4.2 第二組 56
5.4.3 4個聲源定位與分離的結果 62
5.5 FFT Size 和 Frequency Zone Size對分離的影響 62
5.5.1 2語者 62
5.5.2 4語者 68
5.5.3 運算時間 72
六、結論與未來展望 74
參考文獻 75

[1].Hyvärinen, Aapo, and Erkki Oja. “Independent component analysis: algorithms and applications.” Neural networks 13.4-5 (2000): 411-430.
[2].Rickard, Scott. “The DUET blind source separation algorithm.” Blind speech separation. Springer, Dordrecht, 2007. 217-241.
[3].A. Cichocki and S. Amari, Adaptive Blind Signal and Image Processing. Wiley, 2002.
[4].A. Bell and T. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, pp. 1129– 1159, 1995.
[5].J. Cardoso, “Blind signal separation: Statistical principles,” Proceedings of IEEE, Special Issue on Blind System Identification and Estimation, pp. 2009– 2025, Oct. 1998.
[6].E. Weinstein, M. Feder, and A. Oppenheim, “Multi-channel signal separation by decorrelation,” IEEE Trans. on Speech and Audio Processing, vol. 1, no. 4, pp. 405–413, Oct. 1993.
[7].L. Parra and C. Spence, “Convolutive blind source separation of non-stationary sources,” IEEE Transactions on Speech and Audio Processing, pp. 320–327, May 2000.
[8].H. Broman, U. Lindgren, H. Sahlin, and P. Stoica, “Source separation: A TITO system identification approach,” Signal Processing, vol. 73, pp. 169–183, 1999.
[9].Kolbæk, Morten, et al. “Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 25.10 (2017): 1901-1913.
[10].Yu, Dong, et al. “Permutation invariant training of deep models for speaker-independent multi-talker speech separation.” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017.
[11].Wang, Zhong-Qiu, Jonathan Le Roux, and John R. Hershey. “Alternative objective functions for deep clustering.” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018.
[12].A. Griffin, D. Pavlidi, M. Puigt, and A. Mouchtaris, “Realtime multiple speaker DOA estimation in a circular microphone array based on matching pursuit,” in Proceedings 20th Eur. Signal Process. Conf. (EUSIPCO), pp. 2303-2307, 2012.

[13].D. Pavlidi, A. Griffin, M. Puigt and A. Mouchtaris, "Real-Time Multiple Sound Source Localization and Counting Using a Circular Microphone Array," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2193-2206, Oct. 2013,
[14].A. Griffin, D. Pavlidi, M. Puigt and A. Mouchtaris, "Real-time multiple speaker DOA estimation in a circular microphone array based on Matching Pursuit," 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), Bucharest, 2012, pp.
[15].Tseng, S.-C. 2013. Lexical coverage in Taiwan Mandarin conversation. International Journal of Computational Linguistics and Chinese Language Processing 18(1): 1-18.
[16].S. Araki, F. Nesta, E. Vincent, Z. Koldovsky, G. Nolte, A. Ziehe and A. Benichoux, The 2011 Signal Separation Evaluation Campaign (SiSEC2011): - Audio source separation - , in Proc. Int. Conf. on Latent Variable Analysis and Signal Separation, pp. 414-422, 2012

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top