跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.188) 您好!臺灣時間:2025/10/07 18:43
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:賴緯秩
研究生(外文):Lai, Wei-Chih
論文名稱:影音整合處理來判斷多音源與人物關係
論文名稱(外文):Relationship of Multiple Sound Sources and Subjects Determined by Integrated Audio Visual Processing
指導教授:陳自強陳自強引用關係
指導教授(外文):Chen, Tzu-Chiang
口試委員:陳自強賴文能黃敬群薛幼苓
口試委員(外文):Chen, Tzu-ChiangLai, Wen-NengHuang, Ching-ChunHsueh, Yu-Ling
口試日期:2018-01-17
學位類別:碩士
校院名稱:國立中正大學
系所名稱:電機工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:76
中文關鍵詞:居家照護麥克風陣列環景攝影機盲蔽訊號分離聲源數量聲源方位
外文關鍵詞:homecaremicrophone arraypanoramic camerablind source separationsound source countingsound source localization
相關次數:
  • 被引用被引用:1
  • 點閱點閱:236
  • 評分評分:
  • 下載下載:3
  • 收藏至我的研究室書目清單書目收藏:1
本論文使用麥克風陣列並搭配環景攝影機應用在居家照護當中的多聲源及其種類判斷。其中在麥克風陣列方面,我們錄下多個音源同時存在的聲音訊號,首先判斷在該環境下有多少個聲源存在,其整體聲源數量判斷正確率有92.7%,接下來使用盲蔽訊號分離演算法,並將聲源數量的值當作演算法輸入端,以分出與聲源數量相同的預估聲源訊號,這些預估聲源訊號可以代表每一個聲源的聲音資訊,實驗後發現訊號干擾比最高可達11dB。在盲蔽訊號分離演算法當中可以得到分離矩陣,我們擷取分離矩陣的相位資訊當作聲源方位的判斷,此時方位誤差小於20度的正確率為45.6%。然而,當環境當中存在著空間迴響時會降低聲源方位的判斷,因此,我們使用適當的濾波器降低第一反射路徑的聲音資訊來提高方位判斷的正確率,這時方位誤差小於20度的正確率變成46.5%。此外,我們另外搭配環景攝影機的影像資訊做為輔助,利用背景相減法找出攝影範圍當中有變動的區域,如果我們可以找到某個變動區域是四邊形,即可得知該變動區域是電視螢幕,每個變動區域都可以得到對應的方位,對應聲源訊號的方位資訊即可找到電視和人物對應的發出的聲音,未來可以針對人物的說話聲達到居家照護的效果。
In this dissertation, we use a microphone array with a panoramic camera to identify multi-sound sources for home care. In the microphone array, we recorded the sound and determined how many sound sources exist in the environment. Overall, the accuracy of determining number of sound sources is 92.7%. Followed by a blind source separation (BSS) algorithm, and the value of the number of sound sources as the input of the algorithm to separate the mixed signal, the estimated signal can represent the sound information of each sound source. From the experiments, we found that signal to interference ratio (SIR) can be up to 11dB. In the algorithm of BSS, a separation matrix can be obtained. We took the phase information of the separation matrix to determine the direction of the sound source, and the correctness of sound sources direction estimation is 45.6%. However, when there exists echo in the environment, the determination of the position of the sound source will be degraded. Therefore, we use one-tap pitch filter to reduce the sound information of the first reflection path to improve the accuracy of the azimuth determination. In this case, the correctness of sound sources direction estimation increased to 46.5%. In addition, we also use the video information of the panoramic camera as a supplement to find out the area of variation among the photographic areas by using a background subtraction method. If we can find a certain area of change is a quadrilateral, we can know that the area of change is a television screen. Each changeable area can get the corresponding position, and can be compared to the sound signal of the azimuth information to find the sound emitted by the television and the human. In the future, we will achieve the effect of home care by analysis the sound of human speeches.
第一章 緒論
1.1 前言
1.2 研究動機與目的
1.3 論文架構
第二章 背景介紹與相關研究
2.1 居家照護相關發展
2.2 麥克風陣列(Microphone array)與其應用
2.2.1 降低噪音(Noise Reduction)
2.2.2 音源定位(Sound Source Localization; SSL)
2.2.3 音源分離(Source Separation)
2.3 盲蔽訊號分離(Blind Source Separation; BSS)
2.3.1 時域BSS
2.3.2 頻域BSS
2.3.2.1 訊號振幅大小(scaling)問題
2.3.2.2 排列(permutation)問題
2.4 影音整合相關研究
第三章 應用在居家環境的說話者語句辨識
3.1 軟體介紹與相關設備
3.2 系統流程與方法
3.2.1 聲源數量判斷
3.2.1.1 尋找單一音源的音框
3.2.1.2 計算時間差
3.2.1.3 統計圖(histogram)及其平滑曲線
3.2.1.4 波峰偵測
3.2.2 盲蔽訊號分離(BSS)
3.2.3 聲源方位判斷
3.2.3.1 計算距離差資訊
3.2.3.2 音源方位資訊
3.2.4 濾波器
3.2.4.1 切音框並尋找有聲部分音框
3.2.4.2 Auto-correlation找延遲點數並修正
3.2.4.3 濾波器
3.2.5 影像之動態區域判斷
3.2.5.1 背景相減法
3.2.5.2 電視螢幕偵測
第四章 實驗結果與討論
4.1 聲源數量判斷
4.2 聲源方位判斷
第五章 結論與未來工作
參考資料
附錄一

[1]全球人壽退休資訊,政府關鍵報告,https://www.transglobe.com.tw/transglobe-retireplan/content/10400。
[2]Amazon Echo, https://www.amazon.com/Amazon-Echo-Bluetooth-Speaker-with-WiFi-Alexa/dp/B00X4WHP5E .
[3]Google Home, https://madeby.google.com/intl/en_us/home/ .
[4]Tunstall Healthcare, http://www.tunstallhealthcare.com.au/ .
[5]AT&T EverThere, http://www.goodhousekeeping.com/health-products/health-tracker-reviews/a30201/a-t-and-t-everthere/ .
[6]蓋德科技股份有限公司,http://www.guidercare.com/。
[7]SecuFirst居家看護系列,數位無線家居影音監視器,http://www.secufirst.com.tw/products_detail.aspx?Pid=19。
[8]Amazon Echo Teardown - iFixit, https://www.ifixit.com/Teardown/Amazon+Echo+Teardown/33953 .
[9]Google Home Teardown - iFixit, https://www.ifixit.com/Teardown/Google+Home+Teardown/72684 .
[10]K. S. R. Murty, and B. Yegnanarayana, “Combining evidence from residual phase and MFCC features for speaker recognition,” IEEE Signal Processing Letters, vol. 13, no. 1, pp. 52-55, 2006.
[11]S. Chu, S. Narayanan, and C. -C. J. Kuo, “Environmental sound recognition with time-frequency audio features,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 6, pp. 1142-1158, 2009.
[12]H. D. Tran, and H. Li, “Sound event recognition with probabilistic distance SVMs,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 6, pp. 1556-1568, 2011.
[13]Delay Sum Beamforming - The Lab Book Pages, http://www.labbookpages.co.uk/audio/beamforming/delaySum.html .
[14]H. Xia, K. Yang, Y. Ma, Y. Wang, and Y. Liu, “Noise reduction method for acoustic sensor arrays in underwater noise,” IEEE Sensors Journal, vol. 16, no. 24, pp. 8972-8981, 2016.
[15]X. Alameda-Pineda, and R. Horaud, “A geometric approach to sound source localization from time-delay estimates,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 6, pp. 1082-1095, 2014.
[16]X. Shi, B. D. O. Anderson, G. Mao, Z. Yang, J. Chen, and Z. Lin, “Robust localization using time difference of arrivals,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1320-1324, 2016.
[17]J. Benesty, J. Chen, and Y. Huang, “Time-delay estimation via linear interpolation and cross correlation,” IEEE Transactions on Speech and Audio Processing, vol. 12, no. 5, pp. 509-519, 2004.
[18]J. Benesty, J. Chen, and Y. Huang, “Microphone array signal processing,” Springer Science & Business Media, vol. 1, 2008. ISBN: 978-3-540-78612-2.
[19]L. Wang, T. -K. Hon, J. D. Reiss, and A. Cavallaro, “An iterative approach to source counting and localization using two distant microphones,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 6, pp. 1079-1093, 2016.
[20]C. H. Knapp, and G. C. Carter, “The generalized correlation method for estimation of time delay,” IEEE Transactions on Acoustic, Speech, and Signal Processing, vol. 24, no. 4, pp. 320-327, 1976.
[21]D. Pavlidi, A. Griffin, M. Puigt, and A. Mouchtaris, “Real-time multiple sound source localization and counting using a circular microphone array,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2193-2206, 2013.
[22]J. -S. Hu, and C. -H. Yang, “Estimation of sound source number and directions under a multisource reverberant environment,” EURASIP Journal on Advances in Signal Processing, 2010.
[23]E. C. Cherry, “Some experiments on the recognition of speech, with one ear and with two ears,” The Journal of the Acoustical Society of America, vol. 25, no. 5, pp. 975-979, 1953.
[24]A. Ozerov, E. Vincent, and F. Bimbot, “A general flexible framework for the handling of prior information of audio source separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 4, pp. 1118-1133, 2012.
[25]T. Otsuka, K. Ishiguro, T. Yoshioka, H. Sawada, and H. G. Okuno, “Multichannel sound source dereverberation and separation for arbitrary number of sources based on Bayesian nonparametrics,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 12, pp. 2218-2232, 2014.
[26]M. Castella, and E. Moreau, “New kurtosis optimization schemes for MISO equalization,” IEEE Transactions on Signal Processing, vol. 60, no. 3, pp. 1319-1330, 2012.
[27]Z. Koldovský, and P. Tichavský, “Time-domain blind separation of audio sources on the basis of a complete ICA decomposition of an observation space,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 2, pp. 406-416, 2011.
[28]J. -T. Chien, and H. -L. Hsieh, “Convex divergence ICA for blind source separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 302-313, 2012.
[29]K. Rahbar, and J. P. Reilly, “A frequency domain method for blind source separation of convolutive audio mixtures,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 832-844, 2005.
[30]H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee, and K. Shikano, “Blind source separation based on a fast-convergence algorithm combining ICA and beamforming,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 2, pp. 666-678, 2006.
[31]K. Matsuoka, “Minimal distortion principle for blind source separation,” Proceedings of the 41st SICE Annual Conference, vol. 4, pp. 2138-2143, 2002.
[32]Y. Zhang, K. Cao, K. Wu, T. Yu, and N. Zhou, “Audio-visual underdetermined blind source separation algorithm based on Gaussian potential function,” China Communications, vol. 11, no. 6, pp. 71-80, 2014.
[33]M. S. Khan, S. M. Naqvi, A. -u. -Rehnam, W. Wang, and J. Chambers, “Video-aided model-based source separation in real reverberant rooms,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 9, pp. 1900-1912, 2013.
[34]J. B. Allen, and D. A. Berkley, “Image method for efficiently simulating small‐room acoustics,” The Journal of the Acoustical Society of America, vol. 65, no. 4, pp. 943-950, 1979.
[35]https://github.com/ehabets/RIR-Generator
[36]Speechnotes, https://speechnotes.co/ .
[37]C. Loader, “Local regression and likelihood,” Springer Science & Business Media, 2006.
[38]J. Nikunen, and T. Virtanen, “Direction of arrival based spatial covariance model for blind sound source separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 3, pp. 727-739, 2014.
[39]A. M. Kondoz, “Digital speech: coding for low bit rate communication systems,” John Wiley & Sons, 2005.
[40]Z. Zivkovic, and F. Van Der Heijden, “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Pattern recognition letters, vol. 27, no. 7, pp. 773-780, 2006.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top