跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.63) 您好!臺灣時間:2026/06/10 12:26
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳姿羽
研究生(外文):Chen, Tzu-Yu
論文名稱:以自編碼協助基於人形參數之深度神經網路合成個人化頭型轉換函數
論文名稱(外文):Autoencoding HRTFs for DNN based HRTF personalization using anthropometric features
指導教授:冀泰石
指導教授(外文):Chi,Tai-Shih
口試委員:曹昱王逸如
口試委員(外文):Tsao, YuWang, Yih-Ru
口試日期:2018-07-20
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電信工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:46
中文關鍵詞:頭型轉換函數自編碼器
外文關鍵詞:HRTFautoencoder
相關次數:
  • 被引用被引用:0
  • 點閱點閱:326
  • 評分評分:
  • 下載下載:24
  • 收藏至我的研究室書目清單書目收藏:0
頭型轉換函數(Head-related transfer function)可視為一空間聲學濾波器,用來描述聲音從特定方向到耳道傳播路徑的頻率響應。藉由這樣的濾波器,我們可以模擬聲音經過了這樣的路徑,進而使聽者產生聲音的空間感。由於聲音在傳播路徑中會與聽者的頭部、軀幹、耳廓等產生反射、繞射等交互作用,因此,此濾波器因應人的外型及聲源方向而異。若使用不適當的頭型轉換函數將造成聽者對聲音空間感的混淆,但是直接量測聽者的頭型轉換函數又十分耗時。於是我們的目的在合成一高度個人化的頭型轉換函數。我們首先利用深度學習中初始化的技巧對基於於深度神經網路合成個人化頭型轉換函數的方法進行優化,大幅節省訓練時間。接著提出使用自編碼器降維,萃取頭型轉換函數重要資訊的方法來降低神經網路的複雜度,以達到減低過擬合造成的變異性大及結果不穩定的問題,同時也合成更加精確的個人化頭型轉換函數。並且,藉由這個方法,我們嘗試結合不同頭型轉換函數的資料庫來提升自編碼器的編碼及解碼表現,來幫助頭型轉換函數的合成。
Head-related transfer functions (HRTFs) are a group of filters which model the scattering effected by the head, the torso, and the pinna of a user while an audio wave travelling from the sound source to the ear cannel. Filtering a non-spatial audio signal by HRTFs can transform it to a spatial audio signal virtually from specific directions. The interaction within traveling path of audio wave cause HRTFs differ from direction to direction and from subject to subject. As a result, using HRTFs of another person would cause off-tuned spatial perception such as front-back confusion. Unfortunately, measuring HRTFs of each user is time-consuming and an expensive procedure. We aimed to proposed a method to synthesize personalized HRTFs based on several anthropometric features of each user. We first conduct initialization on a deep neural network to accelerate the training process. Later, we proposed to use autoencoder for dimensional reduction and extracting important information from raw HRTFs to simplify the deep neural network in order to improve the problem of large variance and instability resulted from overfitting. Besides, through this method, we try to increase training data by combining different database, aim to improve the performance of autoencoder and help the synthesis of HRTFs.
摘要 i
Abstract ii
誌謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章 緒論..............................1
1.1 研究動機...............................1
1.2 研究方向...............................1
1.3 章節大綱...............................2
第二章 文獻探討...........................3
2.1 心理聲學之空間線索......................3
2.1.1 ITD和ILD............................3
2.1.2 頭型轉換函數(HRTFs)..................5
2.1.3 HRTFs的量測..........................6
2.2 資料庫介紹—CIPIC HRTF dataset[18].......8
2.3 個人化頭型轉換函數......................10
2.3.1 個人化頭型轉換函數的重要性與方法.......10
2.3.2 評分量度.............................12
2.4 自編碼器(Autoencoder)[32]..............12
第三章 對照組架構及優化方法................14
3.1 模型架構...............................14
3.2 資料前處理.............................15
3.3 參數初始化(initialization).............16
3.4 實驗結果...............................17
第四章 系統架構與參數調整..................20
4.1 自編碼器...............................21
4.1.1 自編碼器(autoencoder)架構.............21
4.1.2 訓練資料.............................22
4.1.3 結合多個資料庫........................23
4.1.4 參數的調整...........................24
4.2 深度神經網路............................29
第五章 實驗結果與討論......................31
5.1 使用自編碼器降維與否的效果比較 ...........31
5.1.1 對受測者而言的效果.....................31
5.1.2 對方向而言的效果.......................34
5.2 使用不同資料量訓練之自編碼器的效果比較.....37
5.2.1 對受測者而言的效果.....................38
5.2.2 對方向而言的效果.......................39
第六章 結論與未來展望.......................43
參考文獻.....................................44
[1] F. Grijalva, L. Martini, D. Florencio, and S. Goldenstein, "A Manifold Learning Approach for Personalizing HRTFs from Anthropometric Features," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 3, pp. 559-570, 2016.
[2] F. Grijalva, L. Martini, S. Goldenstein, and D. Florencio, "Anthropometric-based customization of head-related transfer functions using Isomap in the horizontal plane," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 4473-4477.
[3] J. B. Tenenbaum, V. d. Silva, and J. C. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, 10.1126/science.290.5500.2319 vol. 290, no. 5500, p. 2319, 2000.
[4] K. J. Fink and L. Ray, "Individualization of head related transfer functions using principal component analysis," Applied Acoustics, vol. 87, pp. 162-173, 2015/01/01/ 2015.
[5] K. J. Fink and L. Ray, "Tuning principal component weights to individualize HRTFs," in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 389-392.
[6] D. J. Kistler and F. L. Wightman, A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction. 1992, pp. 1637-47.
[7] J. C. Middlebrooks and D. M. Green, "Observations on a principal components analysis of head‐related transfer functions," The Journal of the Acoustical Society of America, vol. 92, no. 1, pp. 597-599, 1992/07/01 1992.
[8] H. Hugeng, D. Gunawan, and W. Wahidin, Effective Preprocessing in Modeling Head-Related Impulse Responses Based on Principal Components Analysis. 2010.
[9] C. J. Chun, J. M. Moon, G. W. Lee, N. K. Kim, and H. K. Kim, "Deep Neural Network Based HRTF Personalization Using Anthropometric Measurements," in Audio Engineering Society Convention 143, 2017.
[10] S. Devore, A. Ihlefeld, K. Hancock, B. Shinn-Cunningham, and B. Delgutte, "Accurate Sound Localization in Reverberant Environments is Mediated by Robust Encoding of Spatial Cues in the Auditory Midbrain," Neuron, vol. 62, no. 1, pp. 123-134, 2009.
[11] D. Wesley Grantham, "Chapter 9 - Spatial Hearing and Related Phenomena A2 - Moore, Brian C.J," in HearingSan Diego: Academic Press, 1995, pp. 297-345.
[12] P. X. A Joris and T. C. T. A Yin, "A matter of time: internal delays in binaural processing," Trends in neurosciences, vol. 30 2, pp. 70-8, 2007.
[13] J. W. Strutt. (1907) On our perception of sound direction. Philosophical Magazine. 214-232.
[14] J. Blauert, Spatial hearing. Ed: Cambridge, Ma: MIT Press, 1997.
[15] V. R. Algazi, C. Avendano, and R. O. Duda, "Elevation localization and head-related transfer function analysis at low frequencies," The Journal of the Acoustical Society of America, vol. 109, no. 3, pp. 1110-1122, 2001/03/01 2001.
[16] F. Asano, Y. Suzuki, and T. Sone, "Role of spectral cues in median plane localization," The Journal of the Acoustical Society of America, vol. 88, no. 1, pp. 159-168, 1990/07/01 1990.
[17] V. C. Raykar, R. Duraiswami, and B. Yegnanarayana, "Extracting the frequencies of the pinna spectral notches from measured head‐related impulse responses," The Journal of the Acoustical Society of America, vol. 116, no. 4, pp. 2625-2625, 2004/10/01 2004.
[18] V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, "The CIPIC HRTF database," in Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575), 2001, pp. 99-102.
[19] W. G. Gardner and K. D. Martin, HRTF measurements of a KEMAR. 1995, pp. 3907-3908.
[20] N. Gumerov, A. E O'Donovan, R. Duraiswami, and D. N Zotkin, Computation of the head-related transfer function via the fast multipole accelerated boundary element method and its spherical harmonic representation. 2010, pp. 370-86.
[21] C. P. Brown and R. O. Duda, "An efficient HRTF model for 3-D sound," in Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics, 1997, p. 4 pp.
[22] X. Liu and X. Zhong, "An improved anthropometry-based customization method of individual head-related transfer functions," in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 336-339.
[23] D. Y. N. Zotkin, J. Hwang, R. Duraiswaini, and L. S. Davis, "HRTF personalization using anthropometric measurements," in 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684), 2003, pp. 157-160.
[24] P. Bilinski, J. Ahrens, M. R. P. Thomas, I. J. Tashev, and J. C. Platt, "HRTF magnitude synthesis via sparse representation of anthropometric features," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 4468-4472.
[25] J. He, W. S. Gan, and E. L. Tan, "On the preprocessing and postprocessing of HRTF individualization based on sparse representation of anthropometric features," in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 639-643.
[26] H. Hu, L. Zhou, J. Zhang, H. Ma, and Z. Wu, "Head Related Transfer Function Personalization Based on Multiple Regression Analysis," in 2006 International Conference on Computational Intelligence and Security, 2006, vol. 2, pp. 1829-1832.
[27] Q. H. Huang and Q. L. Zhuang, "HRIR personalisation using support vector regression in independent feature space," Electronics Letters, vol. 45, no. 19, pp. 1002-1003, 2009.
[28] M. Zhang, R. A. Kennedy, T. D. Abhayapala, and W. Zhang, "Statistical method to identify key anthropometric parameters in hrtf individualization," in 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2011, pp. 213-218.
[29] H. Gamper, D. Johnston, and I. J. Tashev, "Interaural time delay personalisation using incomplete head scans," in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 461-465.
[30] M. Aussal, F. Alouges, and B. Katz, ITD Interpolation and Personalization for Binaural Synthesis Using Spherical Harmonics. 2012, pp. 04_01-10.
[31] I. Tashev, "HRTF phase synthesis via sparse representation of anthropometric features," in 2014 Information Theory and Applications Workshop (ITA), 2014, pp. 1-5.
[32] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.
[33] G. Kearney and T. Doyle, "A HRTF Database for Virtual Loudspeaker Rendering " presented at the 139th International Convention of the Audio Engineering Society, New York, October 2015.
[34] O. Warusfel, "The LISTEN HRTF database," ed, 2013.
[35] H. M. Fayek, On Data-Driven Approaches to Head-Related-Transfer Function Personalization. 2017.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top