臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.188) 您好！臺灣時間：2025/10/07 18:43

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

賴緯秩

研究生(外文):

Lai, Wei-Chih

論文名稱:

影音整合處理來判斷多音源與人物關係

論文名稱(外文):

Relationship of Multiple Sound Sources and Subjects Determined by Integrated Audio Visual Processing

指導教授:

陳自強

指導教授(外文):

Chen, Tzu-Chiang

口試委員:

陳自強、賴文能、黃敬群、薛幼苓

口試委員(外文):

Chen, Tzu-Chiang、Lai, Wen-Neng、Huang, Ching-Chun、Hsueh, Yu-Ling

口試日期:

2018-01-17

學位類別:

碩士

校院名稱:

國立中正大學

系所名稱:

電機工程研究所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2018

畢業學年度:

106

語文別:

中文

論文頁數:

中文關鍵詞:

居家照護、麥克風陣列、環景攝影機、盲蔽訊號分離、聲源數量、聲源方位

外文關鍵詞:

homecare、microphone array、panoramic camera、blind source separation、sound source counting、sound source localization

相關次數:

被引用:1
點閱:236
評分:
下載:3
書目收藏:1

本論文使用麥克風陣列並搭配環景攝影機應用在居家照護當中的多聲源及其種類判斷。其中在麥克風陣列方面，我們錄下多個音源同時存在的聲音訊號，首先判斷在該環境下有多少個聲源存在，其整體聲源數量判斷正確率有92.7%，接下來使用盲蔽訊號分離演算法，並將聲源數量的值當作演算法輸入端，以分出與聲源數量相同的預估聲源訊號，這些預估聲源訊號可以代表每一個聲源的聲音資訊，實驗後發現訊號干擾比最高可達11dB。在盲蔽訊號分離演算法當中可以得到分離矩陣，我們擷取分離矩陣的相位資訊當作聲源方位的判斷，此時方位誤差小於20度的正確率為45.6%。然而，當環境當中存在著空間迴響時會降低聲源方位的判斷，因此，我們使用適當的濾波器降低第一反射路徑的聲音資訊來提高方位判斷的正確率，這時方位誤差小於20度的正確率變成46.5%。此外，我們另外搭配環景攝影機的影像資訊做為輔助，利用背景相減法找出攝影範圍當中有變動的區域，如果我們可以找到某個變動區域是四邊形，即可得知該變動區域是電視螢幕，每個變動區域都可以得到對應的方位，對應聲源訊號的方位資訊即可找到電視和人物對應的發出的聲音，未來可以針對人物的說話聲達到居家照護的效果。

In this dissertation, we use a microphone array with a panoramic camera to identify multi-sound sources for home care. In the microphone array, we recorded the sound and determined how many sound sources exist in the environment. Overall, the accuracy of determining number of sound sources is 92.7%. Followed by a blind source separation (BSS) algorithm, and the value of the number of sound sources as the input of the algorithm to separate the mixed signal, the estimated signal can represent the sound information of each sound source. From the experiments, we found that signal to interference ratio (SIR) can be up to 11dB. In the algorithm of BSS, a separation matrix can be obtained. We took the phase information of the separation matrix to determine the direction of the sound source, and the correctness of sound sources direction estimation is 45.6%. However, when there exists echo in the environment, the determination of the position of the sound source will be degraded. Therefore, we use one-tap pitch filter to reduce the sound information of the first reflection path to improve the accuracy of the azimuth determination. In this case, the correctness of sound sources direction estimation increased to 46.5%. In addition, we also use the video information of the panoramic camera as a supplement to find out the area of variation among the photographic areas by using a background subtraction method. If we can find a certain area of change is a quadrilateral, we can know that the area of change is a television screen. Each changeable area can get the corresponding position, and can be compared to the sound signal of the azimuth information to find the sound emitted by the television and the human. In the future, we will achieve the effect of home care by analysis the sound of human speeches.

第一章緒論
1.1 前言
1.2 研究動機與目的
1.3 論文架構
第二章背景介紹與相關研究
2.1 居家照護相關發展
2.2 麥克風陣列(Microphone array)與其應用
2.2.1 降低噪音(Noise Reduction)
2.2.2 音源定位(Sound Source Localization; SSL)
2.2.3 音源分離(Source Separation)
2.3 盲蔽訊號分離(Blind Source Separation; BSS)
2.3.1 時域BSS
2.3.2 頻域BSS
2.3.2.1 訊號振幅大小(scaling)問題
2.3.2.2 排列(permutation)問題
2.4 影音整合相關研究
第三章應用在居家環境的說話者語句辨識
3.1 軟體介紹與相關設備
3.2 系統流程與方法
3.2.1 聲源數量判斷
3.2.1.1 尋找單一音源的音框
3.2.1.2 計算時間差
3.2.1.3 統計圖(histogram)及其平滑曲線
3.2.1.4 波峰偵測
3.2.2 盲蔽訊號分離(BSS)
3.2.3 聲源方位判斷
3.2.3.1 計算距離差資訊
3.2.3.2 音源方位資訊
3.2.4 濾波器
3.2.4.1 切音框並尋找有聲部分音框
3.2.4.2 Auto-correlation找延遲點數並修正
3.2.4.3 濾波器
3.2.5 影像之動態區域判斷
3.2.5.1 背景相減法
3.2.5.2 電視螢幕偵測
第四章實驗結果與討論
4.1 聲源數量判斷
4.2 聲源方位判斷
第五章結論與未來工作
參考資料
附錄一

[1]全球人壽退休資訊，政府關鍵報告，https://www.transglobe.com.tw/transglobe-retireplan/content/10400。
[2]Amazon Echo, https://www.amazon.com/Amazon-Echo-Bluetooth-Speaker-with-WiFi-Alexa/dp/B00X4WHP5E .
[3]Google Home, https://madeby.google.com/intl/en_us/home/ .
[4]Tunstall Healthcare, http://www.tunstallhealthcare.com.au/ .
[5]AT&T EverThere, http://www.goodhousekeeping.com/health-products/health-tracker-reviews/a30201/a-t-and-t-everthere/ .
[6]蓋德科技股份有限公司，http://www.guidercare.com/。
[7]SecuFirst居家看護系列，數位無線家居影音監視器，http://www.secufirst.com.tw/products_detail.aspx?Pid=19。
[8]Amazon Echo Teardown - iFixit, https://www.ifixit.com/Teardown/Amazon+Echo+Teardown/33953 .
[9]Google Home Teardown - iFixit, https://www.ifixit.com/Teardown/Google+Home+Teardown/72684 .
[10]K. S. R. Murty, and B. Yegnanarayana, “Combining evidence from residual phase and MFCC features for speaker recognition,” IEEE Signal Processing Letters, vol. 13, no. 1, pp. 52-55, 2006.
[11]S. Chu, S. Narayanan, and C. -C. J. Kuo, “Environmental sound recognition with time-frequency audio features,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 6, pp. 1142-1158, 2009.
[12]H. D. Tran, and H. Li, “Sound event recognition with probabilistic distance SVMs,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 6, pp. 1556-1568, 2011.
[13]Delay Sum Beamforming - The Lab Book Pages, http://www.labbookpages.co.uk/audio/beamforming/delaySum.html .
[14]H. Xia, K. Yang, Y. Ma, Y. Wang, and Y. Liu, “Noise reduction method for acoustic sensor arrays in underwater noise,” IEEE Sensors Journal, vol. 16, no. 24, pp. 8972-8981, 2016.
[15]X. Alameda-Pineda, and R. Horaud, “A geometric approach to sound source localization from time-delay estimates,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 6, pp. 1082-1095, 2014.
[16]X. Shi, B. D. O. Anderson, G. Mao, Z. Yang, J. Chen, and Z. Lin, “Robust localization using time difference of arrivals,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1320-1324, 2016.
[17]J. Benesty, J. Chen, and Y. Huang, “Time-delay estimation via linear interpolation and cross correlation,” IEEE Transactions on Speech and Audio Processing, vol. 12, no. 5, pp. 509-519, 2004.
[18]J. Benesty, J. Chen, and Y. Huang, “Microphone array signal processing,” Springer Science & Business Media, vol. 1, 2008. ISBN: 978-3-540-78612-2.
[19]L. Wang, T. -K. Hon, J. D. Reiss, and A. Cavallaro, “An iterative approach to source counting and localization using two distant microphones,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 6, pp. 1079-1093, 2016.
[20]C. H. Knapp, and G. C. Carter, “The generalized correlation method for estimation of time delay,” IEEE Transactions on Acoustic, Speech, and Signal Processing, vol. 24, no. 4, pp. 320-327, 1976.
[21]D. Pavlidi, A. Griffin, M. Puigt, and A. Mouchtaris, “Real-time multiple sound source localization and counting using a circular microphone array,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2193-2206, 2013.
[22]J. -S. Hu, and C. -H. Yang, “Estimation of sound source number and directions under a multisource reverberant environment,” EURASIP Journal on Advances in Signal Processing, 2010.
[23]E. C. Cherry, “Some experiments on the recognition of speech, with one ear and with two ears,” The Journal of the Acoustical Society of America, vol. 25, no. 5, pp. 975-979, 1953.
[24]A. Ozerov, E. Vincent, and F. Bimbot, “A general flexible framework for the handling of prior information of audio source separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 4, pp. 1118-1133, 2012.
[25]T. Otsuka, K. Ishiguro, T. Yoshioka, H. Sawada, and H. G. Okuno, “Multichannel sound source dereverberation and separation for arbitrary number of sources based on Bayesian nonparametrics,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 12, pp. 2218-2232, 2014.
[26]M. Castella, and E. Moreau, “New kurtosis optimization schemes for MISO equalization,” IEEE Transactions on Signal Processing, vol. 60, no. 3, pp. 1319-1330, 2012.
[27]Z. Koldovský, and P. Tichavský, “Time-domain blind separation of audio sources on the basis of a complete ICA decomposition of an observation space,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 2, pp. 406-416, 2011.
[28]J. -T. Chien, and H. -L. Hsieh, “Convex divergence ICA for blind source separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 302-313, 2012.
[29]K. Rahbar, and J. P. Reilly, “A frequency domain method for blind source separation of convolutive audio mixtures,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 832-844, 2005.
[30]H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee, and K. Shikano, “Blind source separation based on a fast-convergence algorithm combining ICA and beamforming,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 2, pp. 666-678, 2006.
[31]K. Matsuoka, “Minimal distortion principle for blind source separation,” Proceedings of the 41st SICE Annual Conference, vol. 4, pp. 2138-2143, 2002.
[32]Y. Zhang, K. Cao, K. Wu, T. Yu, and N. Zhou, “Audio-visual underdetermined blind source separation algorithm based on Gaussian potential function,” China Communications, vol. 11, no. 6, pp. 71-80, 2014.
[33]M. S. Khan, S. M. Naqvi, A. -u. -Rehnam, W. Wang, and J. Chambers, “Video-aided model-based source separation in real reverberant rooms,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 9, pp. 1900-1912, 2013.
[34]J. B. Allen, and D. A. Berkley, “Image method for efficiently simulating small‐room acoustics,” The Journal of the Acoustical Society of America, vol. 65, no. 4, pp. 943-950, 1979.
[35]https://github.com/ehabets/RIR-Generator
[36]Speechnotes, https://speechnotes.co/ .
[37]C. Loader, “Local regression and likelihood,” Springer Science & Business Media, 2006.
[38]J. Nikunen, and T. Virtanen, “Direction of arrival based spatial covariance model for blind sound source separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 3, pp. 727-739, 2014.
[39]A. M. Kondoz, “Digital speech: coding for low bit rate communication systems,” John Wiley & Sons, 2005.
[40]Z. Zivkovic, and F. Van Der Heijden, “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Pattern recognition letters, vol. 27, no. 7, pp. 773-780, 2006.

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	基於盲訊號分離語音增強技術之遠距離雜訊語音辨識
2.	利用聲源分離和定位技術合成出虛擬聆聽點的3D音訊
3.	利用陣列式麥克風進行多聲源之定位
4.	基於子空間及獨立元素分析之聲源定位

無相關期刊

1.	利用深度學習類神經網路改善多通道超音波成像品質
2.	利用類神經網路偵測前方車輛是否載物
3.	利用盲蔽訊號分離和單音偵測技巧實現可靠音源定位
4.	針對慣性測量元件訊號進行機器學習之人物行為辨識與跌倒偵測
5.	藉由深度學習實現自動偵測和預估睡眠呼吸中止事件之應用
6.	辨識相片或影音內容之情緒指標來推薦貼圖與配樂之時光寶盒應用
7.	利用深度學習網路於眼動資訊、九軸訊號與生理訊號來估測專注度
8.	Broadcast Markup Language 於嵌入式系統之實作與應用
9.	S波段相位陣列發射器之關鍵零組件―自動增益控制元件設計
10.	藉由眼動資訊、九軸感測訊號及生理訊號估測專注度
11.	用於蜜蜂追跡與識別之諧波雷達系
12.	鑑定Phostensin調控胞吞循環的作用
13.	0.18um CMOS S頻段類比訊號暫存器與功率合成/分配器
14.	適用於多使用者之波束合成系統之掃描型巴特勒矩陣
15.	我國制定兒童少年保護專法之研究

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室