跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.40) 您好!臺灣時間:2026/06/16 22:06
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:吳柏宏
研究生(外文):Po-hung Wu
論文名稱:自組織映射圖應用於聽覺場景式語音分離
論文名稱(外文):Self-Organizing Map on Auditory-Scene based Sound Segregation
指導教授:冀泰石
指導教授(外文):Tai-shih Chi
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電信工程系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:97
語文別:中文
論文頁數:67
中文關鍵詞:語音分離自組織語音處理
外文關鍵詞:Speech segregationSelf organizedSpeech processingSOM
相關次數:
  • 被引用被引用:3
  • 點閱點閱:304
  • 評分評分:
  • 下載下載:50
  • 收藏至我的研究室書目清單書目收藏:1
過去十年間,聽覺感知的一些細部的特性被大量的應用在語音處理的演算法中以提升效能。例如:在語音分離的領域中,使用多個麥克風的演算法如獨立成份分析(Independent Component Analysis, ICA)經常被使用而且有令人滿意的成果。然而,人類並只需要單耳便能將混合的聲音分開。本論文中,我們設計一個基於聽覺感知模型的單耳語音分離系統。我們從此模型中取出不同在時域-頻域上的一些使用於單耳語音分離系統的線索,之後,利用自組織映射圖來模擬神經元將混合的語音分組和歸類成分開的語音。最後,我們將比較分開語音和原來語音來顯示出本系統的效能。
During the past decade, detailed characteristics of auditory perception have been largely incorporated into speech processing algorithms to enhance their performance. For example, in the field of sound segregation, algorithms good for the condition of multiple microphones, such as independent component analysis (ICA), are often used and show satisfactory performance. However, the truth is human has no problems in segregating mixed sounds with only one ear. In this thesis, we design such a monaural speech segregation system based on an auditory perceptual model. Various spectral-temporal cues extracted from the model are used for monaural speech segregation. Then, a self-organizing feature map neural network is utilized to mimic the neural function in segregating and clustering a mixed sound into separated sounds. At the end, we demonstrate our system’s performance by comparing the separated sound with original sound.
中文摘要…………………………………………………………… i
英文摘要…………………………………………………………... ii
誌謝……………………………………………………………… iii
目錄……………………………………………………………… iv
表目錄…………………………………………………………… vii
圖目錄………………………………………………………..... viii
第一章 緒論……………………………………………………. 1
1.1 研究動機………………………………………………….. 1
1.2 聽覺場景分析慨論…………………………………………...2
1.3 研究方法…………………………………………………... 2
1.4 章節綱要………………………………………………….. 2
第二章 聽覺感知模型及系統之基本介紹……………………. 4
2.1 聽覺感知模型介紹..………………………………………… 4
2.1.1 耳朵基本構造簡介……………………………………. 5
2.1.2 初期階段的生理學現象…….………………………….. 5
2.1.3 聽覺感知模型─初期階段的模擬…………………………8
2.1.4 聽覺感知模型─大腦聽覺階段………………………….11
2.2 系統之基本介紹…………………………………………… 14
2.2.1 語料庫簡介……………..…………………………... 14
2.2.2 系統流程簡介………….……………..……………... 15
第三章 語音特徵之抽取…………………………………........16
3.1 音高擷取………………………………………………. 16
3.1.1 音高之定義及相關心理聲學之實驗……….……………..16
3.1.2 泛音模板的建立……………….…………………….. 17
3.1.3 音高抽取之機制……………….…………………….. 21
3.1.4 音高抽取機制之實驗結果……………………………...23
3.2 頻率調變擷取…………………………………………….. 26
3.2.1 頻率調變之定義…………….……….………………. 26
3.2.2 頻率調變的擷取-運用聽覺模型……….…………….…..26
3.3 聲音起始點和終止點擷取……………………………………31
3.3.1 起始點和終止點之定義…………….……….………… 31
3.3.2 起始點和終止點的擷取-運用聽覺模型……….…………..32
3.4 振幅調變擷取………………………………………………35
3.4.1 振幅調變之定義…………….……….………………. 35
3.4.2 振幅調變之擷取-運用聽覺模型……….……………….. 35
第四章 語音分離………………………………………………39
4.1 類神經網路簡介…………………………………………… 39
4.1.1 人工神經元..……….……….………………..……... 40
4.1.2 類神經網路系統架構.……….……….……………….. 42
4.1.3 類神經網路學習演算法..……….……….…………….. 44
4.2 自組織映射圖簡介……………………………..................... 45
4.2.1 自組織映射圖之基本觀念..……….……….…………... 46
4.2.2 自組織映射圖之基本架構及參數..……….………………46
4.2.3 自組織映射圖之演算法………………....……….…..... 50
4.3 語音分離機制…………………………............................... 52
4.3.1 語音分離─利用SOM..……….……….…………..........52
4.3.2 實驗設定及實驗結果…....……….……….…………... 54
4.3.3 實驗設定....................................................................58
4.3.4 實驗結果…………………………………………....59
第五章 結論與未來展望……………………………………....63
5.1 結論…………………………………………….……….. 63
5.2 未來展望…………………...…………………………….. 64
參考文獻…………………………………………….………….65
[1]. Neural System Laboratory, http://www.isr.umd.edu/Labs/NSL/.
[2]. TIMIT Acoustic-Phonetic Continuous Speech Corpus,
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1
[3]. T. Chi, P. Ru and S. A. Shamma, “Multiresolution spectrotemporal analysis of complex
sounds,“ Journal of the Acoustical Society of America, vol. 118, no. 2, pp. 887-906,
August 2005.
[4]. T. Chi, Y. Gao, M. C. Guyton, P. Ru and S. A. Shamma, “Spectro-temporal modulation
transfer function and speech intelligibility,“ Journal of the Acoustical Society of America,
vol. 106, no. 5, pp. 2719-2732, November 1999.
[5]. H. Duifhuis, L.F. Willems and R. J. Sluyter, “Measurement of pitch in speech: An
implementation of Goldstein’s theory of pitch perception,“ Journal of the Acoustical
Society of America, vol. 71, no. 6, pp. 1568-1580, June 1982.
[6]. J. L. Goldstein, “An optimum processor for the central formation of pitch of complex
tone,“ Journal of the Acoustical Society of America, vol. 54, no. 6, pp. 1496-1516, 1973.
[7]. T. W. Parsons, “Separation of speech from interfering speech by means of harmonic
selection,“ Journal of the Acoustical Society of America, vol. 60, no. 4, pp. 911-918,
October 1976.
[8]. N. Grimault, S. P. Bacon and C. Micheyl, “Auditory stream segregation on the basis of
amplitude-modulation rate,“ Journal of the Acoustical Society of America, vol. 111, no. 3,
pp. 1340-1348, March 2002.
[9]. S. MacAdams, “Segregation of concurrent sounds I.:Effect of frequency modulation
coherence,“ Journal of the Acoustical Society of America, vol. 86, no. 6, pp. 2149-2159,
December 1989.
66
[10]. J. F. Culling and Q. Summerfield, “The role of frequency modulation in the perceptual
segregation of concurrent vowels,“ Journal of the Acoustical Society of America, vol. 98,
no. 2, pp. 837-846, August 1995.
[11]. C. J. Darwin, V. Ciocca and G. J. Sandell, “Effect of frequency and amplitude modulation
on the pitch of a complex tone with mistuned harmonic,“ Journal of the Acoustical
Society of America, vol. 95, no. 5, pp. 2631-2636, May 1994.
[12]. K. Wang and S. A. Shamma, “Spectral shape analysis in the central auditory system,”
IEEE Trans. Speech Audio Processing, vol. 3, no. 5, pp. 382–395, September 1995.
[13]. G. Hu and D. Wang, “Monaural speech segregation based on tracking and amplitude
modulation,” IEEE Trans. Neural Networks, vol. 15, no. 5, pp. 1135–1150, September
2004.
[14]. Q. Summerfield, J. F. Culling and A. J. Fourcin, “Auditory Segregation of Competing
voices: absence of effects of FM or AM coherence,” Philosophical Trans. Royal Society
Lond.B 336, pp. 357–366, 1992.
[15]. S. Rosen, “Temporal information in speech:acoustic, auditory and linguistic aspects,”
Philosophical Trans. Royal Society Lond.B 336, pp. 367–373, 1992.
[16]. M. Elhilali and S. A. Shamma, “A Biologically-inspired approach to the cocktail party
problem,” In Proc. ICASSP,vol. 5, pp. 637–640, 2006.
[17]. G. J. Brown and M. Cooke, “Computational auditory scene analysis,” Computer Speech
and Language, vol. 8, pp. 297-336, 1994.
[18]. M, Cooke and D. P. W. Ellis, “The auditory organization of speech and other sources in
listeners and computational models,” Speech Communication, vol. 35, pp. 141-177,
2001.
[19]. P. Ru and S. A. Shamma, “Representation of musical timbre in auditory cortex,” Journal
of New Music Research, vol. 26, pp. 154-169, 1997.
[20]. A. Palmer and S. A. Shamma, “Physiological Representations of speech,” in Speech
Processing in the Auditory System, S. Greenberg, W. A. Ainsworth, A. N. Popper and R.
R. Fay, Eds.: Springer 2004.
[21]. C. J. Darwin, “Pitch and Auditory Grouping,” in Pitch Neural coding and perception, C.
67
J. Plack, A. J. Oxenham, R. R. Fay and A. N. Popper, Eds.: Springer 2005.
[22]. , D. K. Mellinger and B. M. Mont-Reynaud, “Scene Analysis,” in Auditory Computation,
H.L. Hawkins, T.A. McMullen, A.N. Popper, and R.R. Fay, eds. Springer-Verlag, New
York, 1996.
[23]. S. A. Shamma, “Auditory cortical representation of complex acoustic spectra as
inferred from the ripple analysis method,” in Network:Computation in Neural
System, vol. 3, no.7, pp. 439-476, 1996.
[24]. A. S. Bregman, Auditory Scene Analysis, MIT Press,1990.
[25]. T. Kohonen, Self-organizing Maps, Springer Verlag, 1995
[26]. W. Hu, D. Xie and T. Tan “A Hierarchical Self-organizing approach for learning the
patterns of motion trajectories,” IEEE Trans. Neural Networks, vol. 15, no. 1, pp.
135–144, January 2004.
[27]. M. T. Hagan, H. B. Demuth and M. H. Beale, Neural Network Design, PWS Pub. Co.
1995.
[28]. L. V. Fausett, Fundamentals of Neural Networks: Architectures, Algorithms And
Applications (Pie), Prentice Hall 1993.
[29]. 張斐章,張麗秋,類神經網路,東華書局,台北,民國九十四年。
[30]. 陳桂霞,黃重光,自組織映射圖網路簡介,國立台中師範學院教育測驗統計研究所。
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top