臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.40) 您好！臺灣時間：2026/06/16 22:06

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

吳柏宏

研究生(外文):

Po-hung Wu

論文名稱:

自組織映射圖應用於聽覺場景式語音分離

論文名稱(外文):

Self-Organizing Map on Auditory-Scene based Sound Segregation

指導教授:

冀泰石

指導教授(外文):

Tai-shih Chi

學位類別:

碩士

校院名稱:

國立交通大學

系所名稱:

電信工程系所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2008

畢業學年度:

語文別:

中文

論文頁數:

中文關鍵詞:

語音分離、自組織、語音處理

外文關鍵詞:

Speech segregation、Self organized、Speech processing、SOM

相關次數:

被引用:3
點閱:304
評分:
下載:50
書目收藏:1

過去十年間，聽覺感知的一些細部的特性被大量的應用在語音處理的演算法中以提升效能。例如：在語音分離的領域中，使用多個麥克風的演算法如獨立成份分析(Independent Component Analysis, ICA)經常被使用而且有令人滿意的成果。然而，人類並只需要單耳便能將混合的聲音分開。本論文中，我們設計一個基於聽覺感知模型的單耳語音分離系統。我們從此模型中取出不同在時域-頻域上的一些使用於單耳語音分離系統的線索，之後，利用自組織映射圖來模擬神經元將混合的語音分組和歸類成分開的語音。最後，我們將比較分開語音和原來語音來顯示出本系統的效能。

During the past decade, detailed characteristics of auditory perception have been largely incorporated into speech processing algorithms to enhance their performance. For example, in the field of sound segregation, algorithms good for the condition of multiple microphones, such as independent component analysis (ICA), are often used and show satisfactory performance. However, the truth is human has no problems in segregating mixed sounds with only one ear. In this thesis, we design such a monaural speech segregation system based on an auditory perceptual model. Various spectral-temporal cues extracted from the model are used for monaural speech segregation. Then, a self-organizing feature map neural network is utilized to mimic the neural function in segregating and clustering a mixed sound into separated sounds. At the end, we demonstrate our system’s performance by comparing the separated sound with original sound.

中文摘要…………………………………………………………… i
英文摘要…………………………………………………………... ii
誌謝……………………………………………………………… iii
目錄……………………………………………………………… iv
表目錄…………………………………………………………… vii
圖目錄………………………………………………………..... viii
第一章緒論……………………………………………………. 1
1.1 研究動機………………………………………………….. 1
1.2 聽覺場景分析慨論…………………………………………...2
1.3 研究方法…………………………………………………... 2
1.4 章節綱要………………………………………………….. 2
第二章聽覺感知模型及系統之基本介紹……………………. 4
2.1 聽覺感知模型介紹..………………………………………… 4
2.1.1 耳朵基本構造簡介……………………………………. 5
2.1.2 初期階段的生理學現象…….………………………….. 5
2.1.3 聽覺感知模型─初期階段的模擬…………………………8
2.1.4 聽覺感知模型─大腦聽覺階段………………………….11
2.2 系統之基本介紹…………………………………………… 14
2.2.1 語料庫簡介……………..…………………………... 14
2.2.2 系統流程簡介………….……………..……………... 15
第三章語音特徵之抽取…………………………………........16
3.1 音高擷取………………………………………………. 16
3.1.1 音高之定義及相關心理聲學之實驗……….……………..16
3.1.2 泛音模板的建立……………….…………………….. 17
3.1.3 音高抽取之機制……………….…………………….. 21
3.1.4 音高抽取機制之實驗結果……………………………...23
3.2 頻率調變擷取…………………………………………….. 26
3.2.1 頻率調變之定義…………….……….………………. 26
3.2.2 頻率調變的擷取-運用聽覺模型……….…………….…..26
3.3 聲音起始點和終止點擷取……………………………………31
3.3.1 起始點和終止點之定義…………….……….………… 31
3.3.2 起始點和終止點的擷取-運用聽覺模型……….…………..32
3.4 振幅調變擷取………………………………………………35
3.4.1 振幅調變之定義…………….……….………………. 35
3.4.2 振幅調變之擷取-運用聽覺模型……….……………….. 35
第四章語音分離………………………………………………39
4.1 類神經網路簡介…………………………………………… 39
4.1.1 人工神經元..……….……….………………..……... 40
4.1.2 類神經網路系統架構.……….……….……………….. 42
4.1.3 類神經網路學習演算法..……….……….…………….. 44
4.2 自組織映射圖簡介……………………………..................... 45
4.2.1 自組織映射圖之基本觀念..……….……….…………... 46
4.2.2 自組織映射圖之基本架構及參數..……….………………46
4.2.3 自組織映射圖之演算法………………....……….…..... 50
4.3 語音分離機制…………………………............................... 52
4.3.1 語音分離─利用SOM..……….……….…………..........52
4.3.2 實驗設定及實驗結果…....……….……….…………... 54
4.3.3 實驗設定....................................................................58
4.3.4 實驗結果…………………………………………....59
第五章結論與未來展望……………………………………....63
5.1 結論…………………………………………….……….. 63
5.2 未來展望…………………...…………………………….. 64
參考文獻…………………………………………….………….65

[1]. Neural System Laboratory, http://www.isr.umd.edu/Labs/NSL/.
[2]. TIMIT Acoustic-Phonetic Continuous Speech Corpus,
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1
[3]. T. Chi, P. Ru and S. A. Shamma, “Multiresolution spectrotemporal analysis of complex
sounds,“ Journal of the Acoustical Society of America, vol. 118, no. 2, pp. 887-906,
August 2005.
[4]. T. Chi, Y. Gao, M. C. Guyton, P. Ru and S. A. Shamma, “Spectro-temporal modulation
transfer function and speech intelligibility,“ Journal of the Acoustical Society of America,
vol. 106, no. 5, pp. 2719-2732, November 1999.
[5]. H. Duifhuis, L.F. Willems and R. J. Sluyter, “Measurement of pitch in speech: An
implementation of Goldstein’s theory of pitch perception,“ Journal of the Acoustical
Society of America, vol. 71, no. 6, pp. 1568-1580, June 1982.
[6]. J. L. Goldstein, “An optimum processor for the central formation of pitch of complex
tone,“ Journal of the Acoustical Society of America, vol. 54, no. 6, pp. 1496-1516, 1973.
[7]. T. W. Parsons, “Separation of speech from interfering speech by means of harmonic
selection,“ Journal of the Acoustical Society of America, vol. 60, no. 4, pp. 911-918,
October 1976.
[8]. N. Grimault, S. P. Bacon and C. Micheyl, “Auditory stream segregation on the basis of
amplitude-modulation rate,“ Journal of the Acoustical Society of America, vol. 111, no. 3,
pp. 1340-1348, March 2002.
[9]. S. MacAdams, “Segregation of concurrent sounds I.:Effect of frequency modulation
coherence,“ Journal of the Acoustical Society of America, vol. 86, no. 6, pp. 2149-2159,
December 1989.
66
[10]. J. F. Culling and Q. Summerfield, “The role of frequency modulation in the perceptual
segregation of concurrent vowels,“ Journal of the Acoustical Society of America, vol. 98,
no. 2, pp. 837-846, August 1995.
[11]. C. J. Darwin, V. Ciocca and G. J. Sandell, “Effect of frequency and amplitude modulation
on the pitch of a complex tone with mistuned harmonic,“ Journal of the Acoustical
Society of America, vol. 95, no. 5, pp. 2631-2636, May 1994.
[12]. K. Wang and S. A. Shamma, “Spectral shape analysis in the central auditory system,”
IEEE Trans. Speech Audio Processing, vol. 3, no. 5, pp. 382–395, September 1995.
[13]. G. Hu and D. Wang, “Monaural speech segregation based on tracking and amplitude
modulation,” IEEE Trans. Neural Networks, vol. 15, no. 5, pp. 1135–1150, September
2004.
[14]. Q. Summerfield, J. F. Culling and A. J. Fourcin, “Auditory Segregation of Competing
voices: absence of effects of FM or AM coherence,” Philosophical Trans. Royal Society
Lond.B 336, pp. 357–366, 1992.
[15]. S. Rosen, “Temporal information in speech:acoustic, auditory and linguistic aspects,”
Philosophical Trans. Royal Society Lond.B 336, pp. 367–373, 1992.
[16]. M. Elhilali and S. A. Shamma, “A Biologically-inspired approach to the cocktail party
problem,” In Proc. ICASSP,vol. 5, pp. 637–640, 2006.
[17]. G. J. Brown and M. Cooke, “Computational auditory scene analysis,” Computer Speech
and Language, vol. 8, pp. 297-336, 1994.
[18]. M, Cooke and D. P. W. Ellis, “The auditory organization of speech and other sources in
listeners and computational models,” Speech Communication, vol. 35, pp. 141-177,
2001.
[19]. P. Ru and S. A. Shamma, “Representation of musical timbre in auditory cortex,” Journal
of New Music Research, vol. 26, pp. 154-169, 1997.
[20]. A. Palmer and S. A. Shamma, “Physiological Representations of speech,” in Speech
Processing in the Auditory System, S. Greenberg, W. A. Ainsworth, A. N. Popper and R.
R. Fay, Eds.: Springer 2004.
[21]. C. J. Darwin, “Pitch and Auditory Grouping,” in Pitch Neural coding and perception, C.
67
J. Plack, A. J. Oxenham, R. R. Fay and A. N. Popper, Eds.: Springer 2005.
[22]. , D. K. Mellinger and B. M. Mont-Reynaud, “Scene Analysis,” in Auditory Computation,
H.L. Hawkins, T.A. McMullen, A.N. Popper, and R.R. Fay, eds. Springer-Verlag, New
York, 1996.
[23]. S. A. Shamma, “Auditory cortical representation of complex acoustic spectra as
inferred from the ripple analysis method,” in Network:Computation in Neural
System, vol. 3, no.7, pp. 439-476, 1996.
[24]. A. S. Bregman, Auditory Scene Analysis, MIT Press,1990.
[25]. T. Kohonen, Self-organizing Maps, Springer Verlag, 1995
[26]. W. Hu, D. Xie and T. Tan “A Hierarchical Self-organizing approach for learning the
patterns of motion trajectories,” IEEE Trans. Neural Networks, vol. 15, no. 1, pp.
135–144, January 2004.
[27]. M. T. Hagan, H. B. Demuth and M. H. Beale, Neural Network Design, PWS Pub. Co.
1995.
[28]. L. V. Fausett, Fundamentals of Neural Networks: Architectures, Algorithms And
Applications (Pie), Prentice Hall 1993.
[29]. 張斐章，張麗秋，類神經網路，東華書局，台北，民國九十四年。
[30]. 陳桂霞，黃重光，自組織映射圖網路簡介，國立台中師範學院教育測驗統計研究所。

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	2002~2012企業社會責任文獻的分類與發展演化：自組織映射圖分析之運用
2.	結合兩層式自組織映射圖及案例式推理應用於新產品銷售預測研究
3.	發展解決類別不平衡問題方法之探討

無相關期刊

1.	MS在RS系統的路徑重新選擇機制
2.	應用混合啟發式演算法推估地下水污染物暫態釋放問題
3.	在配水管網中餘氯傳輸模式之研究
4.	以目標規劃建立大學圖書館圖書預算分配之研究
5.	針對行動多躍中繼系統在行動WiMAX系統中之無線資源管理效能之研究
6.	醫療民事糾紛訴訟中之證據蒐集-以健康資訊權作為擴充手段
7.	利用訊號特徵及麥克風陣列
8.	多姿態人臉辨識及其在機器人與人互動之應用
9.	四輪轉向實驗車輛的建構與控制之研究
10.	鍺基板及磊晶鍺通道製作P型金氧半場效電晶體與電性分析研究
11.	橫向擴散的射頻金氧半場效電晶體之佈局設計與熱特性分析
12.	頻寬-位元率-失真最佳化之移動估測
13.	互補式金氧半電流操作模式之射頻接收器前端電路設計與分析
14.	筆記型電腦產業營運模式分析與供應鏈設計
15.	半導體廠機台預警管理之研究

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室