跳到主要內容

臺灣博碩士論文加值系統

(44.200.169.3) 您好!臺灣時間:2022/12/04 19:52
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:郭子萱
研究生(外文):Kuo, Tzu-Hsuan
論文名稱:使用人體測量參數之基於整體學習之個人化頭部相關轉換函數估計
論文名稱(外文):Ensemble learning based HRTF personalization using anthropometric features
指導教授:冀泰石
指導教授(外文):Chi, Tai-Shih
口試委員:曹昱王逸如
口試日期:2019-09-25
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:108
語文別:中文
論文頁數:52
中文關鍵詞:頭部相關轉換函數深度學習整體學習人體測量學空間音頻
外文關鍵詞:HRTFdeep learningensemble learninganthropometricspatial audio
相關次數:
  • 被引用被引用:0
  • 點閱點閱:228
  • 評分評分:
  • 下載下載:8
  • 收藏至我的研究室書目清單書目收藏:0
頭部相關轉換函數(Head-related transfer function)是用來描述聲音由聲源位置到人耳之間傳播途徑的頻率響應,可將其視為一空間聲學濾波器。藉由此濾波器,我們可以模擬來自各個方向的聲音,進而使聽者能夠感受聲音的空間感。由於聲音在傳播過程會與聽者的身體軀幹、頭部、耳廓等產生反射與繞射等交互作用,因此頭部相關轉換函數會因人的外觀特徵不同而有所差異,但進行頭部相關轉換函數的測量過程十分耗時,且不匹配的函數會造成聽者在空間判斷上的錯誤。於是我們的目標為精確地合成個人化的頭部相關轉換函數。我們首先利用對於感知聲源方向較具影響力的人體外觀特徵,將訓練的受試者資料進行分群以降低群體內的個體差異,接著結合深度學習與整體學習演算法,經分群與整合兩階段的訓練,透過人體特徵較相近的群體,使神經網路學習到該群體中較具代表性的特徵,進而降低不同受試者的估計落差,同時也合成更精確的個人化頭部相關轉換函數。此外,我們也利用這個方法,將角度的資訊嵌入在輸入特徵中一併學習,得到相較於單一方向訓練而言更全面的實驗結果。
Head-related transfer functions (HRTFs) are frequency responses used to describe an audio wave travelling from the sound source to the ear canal, which can be regarded as spatial acoustic filters. Filtering a non-spatial audio signal by HRTFs can transform it to a spatial audio signal virtually from specific directions. Since HRTFs represent the effects of the listener's head, torso, and pinna, they are highly individual. The most accurate approach to personalizing HRTFs is through direct measurements. However, this is a complex, time-consuming, and expensive procedure, and nonindividualized HRTFs would cause front-back and up-down confusions. We aimed to proposed a method to synthesize personalized HRTFs based on several anthropometric features of each user. We first cluster training subjects using the features that are more influential to the direction perception to reduce the individual differences within the group. Then, combined with the deep learning and ensemble learning algorithm, the training phase consists of ensemble preparation and ensemble integration stages. Through a group with similar anthropometric features, enables the neural network to learn more representative characteristics of the group, thereby synthesizing more precise personalized HRTFs. In addition, we also use this method to embed the information of angles into the input features and learn more comprehensive experimental results compared to the single direction.
摘要 i
Abstract ii
誌謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章 緒論 1
1.1 研究背景 1
1.2 研究方向 2
1.3 章節大綱 2
第二章 文獻探討 3
2.1 心理聲學之空間線索 3
2.1.1 ITD與ILD 3
2.1.2 頭型轉換函數(HRTFs) 6
2.1.3 HRTFs的量測 7
2.2 資料庫介紹 - CIPIC HRTF database [26] 9
2.3 個人化頭型轉換函數 11
2.3.1 個人化頭型轉換函數的重要性與方法 11
2.3.2 評估方法 13
2.4 整體學習演算法(Ensemble learning algorithm) 14
第三章 系統架構 15
3.1 對照組架構及資料前處理 15
3.1.1 對照組模型架構 15
3.1.2 資料前處理 16
3.2 受試者分群 18
3.2.1 人體測量參數選擇 20
3.2.2 k-平均演算法(k-means clustering) 20
3.3 整合深度與整體學習演算法 22
3.3.1 整體預備階段(Ensemble preparation stage, EP stage) 22
3.3.2 整體整合階段(Ensemble integration stage, EI stage) 23
3.3.3 訓練資料配置 23
3.3.4 參數調整 25
第四章 實驗結果與討論 27
4.1 針對各別方向訓練的神經網路 27
4.1.1 對方向而言的效果 27
4.1.2 對受測者而言的效果 30
4.2 針對所有方向訓練的神經網路 33
4.2.1 對方向而言的效果 33
4.2.2 對受測者而言的效果 36
4.3 以Lsd分布圖分群針對所有方向訓練的神經網路 41
4.3.1 對方向而言的效果 41
4.3.2 對受測者而言的效果 44
第五章 結論與未來展望 48
參考文獻 49
[1] J. Chen, B. D. Van Veen, and K. E. Hecox, “A spatial feature extraction and regularization model for the head‐related transfer function,” The Journal of the Acoustical Society of America, vol. 97, no. 1, pp. 1493-1510, 1995.
[2] J. C. Middlebrooks, "Individual differences in external-ear transfer functions reduced by scaling in frequency," The Journal of the Acoustical Society of America, vol. 106, no. 3, pp. 1480-1492, 1999.
[3] C. T. Jin, P. Leong, J. Leung, A. Corderoy, and S. Carlile, “Enabling individualized virtual auditory space using morphological measurements,” in Proceedings of the First IEEE Pacific-Rim Conference on Multimedia (2000 International Symposium on Multimedia Information Processing), 2000, pp. 235-238.
[4] D. Y. N. Zotkin, J. Hwang, R. Duraiswaini, and L. S. Davis, “Hrtf personalization using anthropometric measurements,” in 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No. 03TH8684), 2003, pp. 157-160.
[5] M. A. Ramirez and S. G. Rodriguez, “Hrtf individualization by solving the least squares problem,” in Audio Engineering Society Convention 118, 2005.
[6] S. Xu, Z. Li, and G. Salvendy, “Individualization of head-related transfer function for three-dimensional virtual auditory display: A review,” in International Conference on Virtual Reality, 2007, pp. 397-407.
[7] S. Hwang, Y. Park, and Y.-s. Park, “Modeling and customization of head-related impulse responses based on general basis functions in time domain,” Acta Acustica united with Acustica, vol. 94, no. 6, pp. 965-980, 2008.
[8] M. Akagi and H. Hisatsune, “Admissible range for individualization of head-related transfer function in median plane,” in 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2013, pp. 326-329.
[9] Y. Luo, D. N. Zotkin, and R. Duraiswami, “Virtual autoencoder based recommendation system for individualizing head-related transfer functions,” in 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013, pp. 1-4.
[10] J. He, W. S. Gan, and E. L. Tan, “On the preprocessing and postprocessing of hrtf individualization based on sparse representation of anthropometric features,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 639-643.
[11] W. Lei and Z. Xiangyang, “New method for synthesizing personalized head-related transfer function,” in 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), 2016, pp. 1-5.

[12] M. Buerger, S. Meier, C. Hofmann, W. Kellermann, E. Fischer, and H. Puder, “Retrieval of individualized head-related transfer functions for hearing aid applications,” in 2017 25th European Signal Processing Conference (EUSIPCO), 2017, pp. 6-10.
[13] C. J. Chun, J. M. Moon, G. W. Lee, N. K. Kim, and H. K. Kim, “Deep neural network based HRTF personalization using anthropometric measurements,” in Audio Engineering Society Convention 143, 2017.
[14] T. G. Dietterich, “Ensemble methods in machine learning,” in International workshop on multiple classifier systems, 2000, pp. 1-15.
[15] S. Devore, A. Ihlefeld, K. Hancock, B. Shinn-Cunningham, and B. Delgutte, “Accurate sound localization in reverberant environments is mediated by robust encoding of spatial cues in the auditory midbrain,” Neuron, vol. 62, no. 1, pp. 123-134, 2009.
[16] D. Wesley Grantham, “Chapter 9 - Spatial hearing and related phenomena - Moore, Brian C.J,” in Hearing, San Diego: Academic Press, 1995, pp. 297-345.
[17] P. X. A Joris and T. C. T. A Yin, “A matter of time: internal delays in binaural processing,” Trends in neurosciences, vol. 30, no. 2, pp. 70-78, 2007.
[18] J. W. Strutt. “On our perception of sound direction,” Philosophical Magazine, vol. 13, no. 74, pp. 214-232, 1907.
[19] J. Blauert, “Spatial hearing: the psychophysics of human sound localization,” Ed: Cambridge, Ma: MIT Press, 1997.
[20] B. C. J. Moore, “An introduction to the psychology of hearing,” 5th ed. (Academic Press).
[21] V. R. Algazi, C. Avendano, and R. O. Duda, “Elevation localization and head-related transfer function analysis at low frequencies,” The Journal of the Acoustical Society of America, vol. 109, no. 3, pp. 1110-1122., 2001.
[22] F. Asano, Y. Suzuki, and T. Sone, “Role of spectral cues in median plane localization,” The Journal of the Acoustical Society of America, vol. 88, no. 1, pp. 159-168, 1990.
[23] V. R. Algazi, R. O. Duda, R. P. Morrison, and D. M. Thompson, “Structural composition and decomposition of HRTF’s,” in Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No. 01TH8575), 2001, pp. 103-106.
[24] F. Asano, Y. Suzuki, and T. Sone, “Role of spectrum cues in median plane localization,” The Journal of the Acoustical Society of America, vol. 88, no. 1, pp. 159-168, 1990.
[25] V. C. Raykar, R. Duraiswami, and B. Yegnanarayana, “Extracting the frequencies of the pinna spectral notches in measured head-related impulse responses,” The Journal of the Acoustical Society of America, vol. 118, no. 1, pp. 364-374, 2005.
[26] V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, “The CIPIC HRTF database,” in Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575), 2001, pp. 99-102.

[27] W. G. Gardner and K. D. Martin, “HRTF measurements of a KEMAR,” The Journal of the Acoustical Society of America, vol. 97, no. 6, pp. 3907-3908, 1995.
[28] E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Wightman,“Localization using nonindividualized head-related transfer functions,” The Journal of the Acoustical Society of America, vol. 94, no. 1, pp. 111–123, 1993.
[29] N. A. Gumerov, A. E. O'Donovan, R. Duraiswami, and D. N. Zotkin, “Computation of the head-related transfer function via the fast multipole accelerated boundary element method and its spherical harmonic representation,” The Journal of the Acoustical Society of America, vol. 127, no. 1, pp. 370-386, 2010.
[30] C. P. Brown and R. O. Duda, “An efficient HRTF model for 3-D sound,” in Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics, 1997, pp. 4.
[31] X. Liu and X. Zhong, “An improved anthropometry-based customization method of individual head-related transfer functions,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 336-339.
[32] D. Y. N. Zotkin, J. Hwang, R. Duraiswaini, and L. S. Davis, “HRTF personalization using anthropometric measurements,” in 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684), 2003, pp. 157-160.
[33] P. Runkle, A. Yendiki, and G. Wakefield, (2000). “Active sensory tuning for immersive spatialized audio,” in International Community for Auditory Display, 2000.
[34] A. Silzle, “Selection and tuning of HRTFs,” in Audio Engineering Society Convention 112, 2002.
[35] D. J. Kistler, and F. L. Wightman, “A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction,” The Journal of the Acoustical Society of America, vol.91, no. 3, pp. 1637-1647, 1992.
[36] N. Inoue, T. Kimura, T. Nishino, K. Itou, and K. Takeda, “Evaluation of HRTFs estimated using physical features,” Acoustical science and technology, vol. 26, no. 5, pp. 453-455, 2005.
[37] T. Nishino, N. Inoue, K. Takeda, and F. Itakura, “Estimation of HRTFs on the horizontal plane using physical features,” Applied Acoustics, vol. 68, no. 8, pp. 897-908, 2007.
[38] W. W. Hugeng and D. Gunawan, “Improved method for individualization of head-related transfer functions on horizontal plane using reduced number of anthropometric measurements,” arXiv preprint arXiv:1005.5137, 2010.
[39] O. Ramos and F. Tommansini, “Magnitude modelling of HRTF using principal component analysis applied to complex values,” Archives of Acoustics, vol. 39, no. 4, pp. 477-482, 2014.
[40] H. Gamper, D. Johnston, and I. J. Tashev, “Interaural time delay personalisation using incomplete head scans,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 461-465.
[41] M. Aussal, F. Alouges, and B. Katz, “ITD interpolation and personalization for binaural synthesis using spherical harmonics,” in Audio Engineering Society UK Conference, pp. 04, 2012.
[42] L. Kuncheva and C. Whitaker, “Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy,” Machine Learning, vol. 51, no. 2, pp. 181-207, 2003.
[43] P. Sollich and A. Krogh, “Learning with ensembles: How overfitting can be useful,” in Advances in Neural Information Processing Systems, 1996, pp. 190-196.
[44] G. Brown, J. Wyatt, R. Harris, and X. Yao, “Diversity creation methods: a survey and categorization,” Information Fusion, vol. 6, no. 1, pp.5-20, 2005.
[45] J. J. G. Adeva, U. Beresi, and R. Calvo, “Accuracy and diversity in ensembles of text categorisers,” CLEI Journal, Vol. 9, No. 1, pp. 1-2, 2005.
[46] T. K. Ho, “Random decision forests,” in Proceedings of the Third International Conference on Document Analysis and Recognition, 1995, pp. 278-282.
[47] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, “Ensemble modeling of denoising autoencoder for speech spectrum restoration,” in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
[48] Y. Tsao, C. Lee, “An ensemble speaker and speaking environment modeling approach to robust speech recognition,” IEEE Transactions on Audio, Speech and Language Processing, vol. 17, no. 5, pp. 1025-1037, 2009.
[49] Y. Tsao, X. Lu, P. Dixon, T. Hu, S. Matsuda, and C. Hori, “Incorporating local information of the acoustic environments to MAP-based feature compensation and acoustic model adaptation,” Computer Speech and Language, vol. 28, no. 3, pp. 709-726, 2014.
[50] W. J. Lee, S. S. Wang, F. Chen, X. Lu, S. Y. Chien, and Y. Tsao “Speech dereverberation based on integrated deep and ensemble learning algorithm,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5454-5458.
[51] F. Grijalva, L. Martini, D. Florencio, and S. Goldenstein, “A manifold learning approach for personalizing HRTFs from anthropometric features,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 3, pp. 559-570, 2016.
[52] T. Y. Chen, T. H. Kuo, and T. S. Chi. “Autoencoding HRTFS for DNN based HRTF personalization using anthropometric features.” In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing), 2019, pp. 271-275.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊