跳到主要內容

臺灣博碩士論文加值系統

(35.175.191.36) 您好!臺灣時間:2021/07/30 17:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林怡君
研究生(外文):Yi-Chun Lin
論文名稱:語者辨識於音訊量化飽和下之效能改善
論文名稱(外文):Performance Improvement of Speaker Recognition for Clipped Audio Signals
指導教授:蔡偉和蔡偉和引用關係
口試委員:趙怡翔黃士嘉黃文增
口試日期:2012-07-31
學位類別:碩士
校院名稱:國立臺北科技大學
系所名稱:電腦與通訊研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:中文
論文頁數:38
中文關鍵詞:訊號裁切訊號修復語者驗證
外文關鍵詞:audio signal clippingsignal restorationspeaker verification
相關次數:
  • 被引用被引用:0
  • 點閱點閱:223
  • 評分評分:
  • 下載下載:3
  • 收藏至我的研究室書目清單書目收藏:0
  本論文探討當錄音資料發生量化飽和時對於語者辨識系統的效能影響與改善之道。由於說話聲音經由麥克風接收及取樣後,可能因數值過大而造成量化後的訊號受到振幅限制,亦即出現數值飽和,或俗稱「爆音」,不僅聲音不悅耳,也對自動辨識系統效能造成嚴重衝擊。雖然目前已有若干技術可對爆音稍作修復,使得音質獲得改善,但實驗發現修復後的聲音常與原說話者的聲音有很大的差異,因此相當不利於自動語者辨識。為了增強辨識系統對爆音的處理能力,本論文提出一種爆音刪除機制,將不適合進行語者辨識的爆音區段剔除,留下雖含爆音但可辨識語者身分的聲音區段。經由NIST2001 SRE資料庫進行實驗測試,本論文所提出之改善方法可降低約10%的等錯誤率。

  This thesis investigates the problem of speaker verification under the condition that the recorded speech signals are clipped due to the saturation of quantization. The clipping of audio signals is not only unpleasant for human listening but also detrimental for speaker verification systems. Although there are a number of restoration techniques for improving the auditory quality of the clipped speech signals, it is found that the speaker characteristics of the restored clipped speech signals can be significantly changed; hence, the restoration techniques are of little help for speaker verification . To solve this problem, this study proposes improving the speaker verification by pruning the clipped signals instead of restoring them. However, to avoid that the length of a testing speech signal may be shorten severely after the pruning, we develop methods for detecting and discarding the speech frames that contain harmful clipped signals while keeping the speech frames that contain acceptable clipped signals. Our experiments conducted using the NIST2001 SRE database show that the proposed methods can reduce around 10% of the equal error rate of the speaker verification .

摘要 i
ABSTRACT ii
誌謝 iii
目錄 iv
表目錄 vii
圖目錄 viii
第一章 緒論 1
1.1 研究動機與目的 1
1.2 相關研究敘述 2
1.2.1 語者辨識系統簡介 2
1.2.2 研究背景 3
1.3 章節概要 5
第二章 語者確認之基本系統 6
2.1 前處理 7
2.2 特徵參數擷取 8
2.2.1 音框化(Frame Blocking) 9
2.2.2 漢明窗(Hamming Window) 10
2.2.3 快速傅立葉轉換(Fast Fourier Transform) 11
2.2.4 梅爾刻度三角帶通濾波器(Traingle Band-pass Filter) 11
2.3 高斯混合模型─通用背景模型概念 13
2.3.1 高斯混合模型(Gaussian Mixture Model;GMM) 13
2.3.2 高斯混合模型─通用背景模型(GMM─UBM) 14
第三章 音訊剪裁修復 16
3.1 振幅衰減 17
3.2 濾波調整 17
第四章 爆音音框刪去法 19
4.1 爆音音框刪去法系統架構 19
4.2 基於爆音比例刪去法 20
4.2.1 基於爆音比例刪去法概念流程 20
4.2.2 基於爆音比例刪去法索引建立 21
4.3 基於爆音識別刪去法 22
4.3.1 基於爆音識別刪去法概念流程 22
4.3.2 能量飽和GMM模型建立 23
4.3.3 爆音音框識別 23
第五章 爆音偵測流程與實驗結果 24
5.1 資料庫介紹 24
5.2 梅爾倒頻譜參數設定 25
5.3 實驗流程 26
5.3.1 能量飽和音檔生成 26
5.3.2 實驗訓練語者模型 26
5.3.2.1 GMM-UBM模型 26
5.3.2.2 調適語者模型 27
5.3.3 實驗語者測試 28
5.3.3.1 實驗一 ─ 無修復無刪除 28
5.3.3.2 實驗二 ─ 音訊軟體Adobe Audition修復 30
5.3.3.3 實驗三 ─ 基於爆音比例刪去法 32
5.3.3.4 實驗四 ─ 基於爆音識別刪去法 34
第六章 結論與未來展望 36
6.1 結論 36
6.2 未來發展 36
參考文獻 37



[1] Wikipedia “http://en.wikipedia.org/wiki/Clipping_(audio),” July 2012.
[2] Bruno Defraene, Toon van Waterschoot, Hans Joachim Ferreau, Moritz Diehl and Marc Moonen, “Preception-Based Clipping of Audio Signals,” 18th European Signal Processing Conference, PP. 517-521, August 2010
[3] Li Chun-zhi, Zhu Chang-chun, Tian Guang-ming, “Restoration of Clipped vibration signal Based on BP Neural Network,” ICMTMA, PP. 257-253, Mar.2010
[4] Shin Miura, Hirofumi Nakajima, Shigeki Miyabe, Shoji Makino1 Takeshi Yamada, Kazuhiro Nakadai, “Restoration of Clipped Audio Signal Using Recursive Vector Projection,” TENCON 2011 – 2011 IEEE Region 10 Conference, PP. 394-397, Nov. 2011.
[5] Amir Adler, Valentin Emiya, Maria G. Jafari, Michael Elad, Rémi Gribonval, Mark D. Plumbley, “A Constrained Matching Pursuit Approach to Audio Declpping,” ICASSP, PP. 329-332, 2011
[6] Amir Adler, Valentin Emiya, Maria G. Jafari, Michael Elad, Rémi Gribonval, Mark D. Plumbley, “Audio Inpainting,” IEEE Transactions on Audio, Speech, and Language Processing Vol. 20, NO. 3, PP.922-932, March 2012
[7] Bruno Defreaene, Toon van Watershoot, Moritz Diehl, Marc Moonen, “A Fast Projected Fradient Optimization Method for Real-Time Perceprion-Base Clipping of Audio Signals,” ICASSP, PP.333-336, 2011
[8] Jonathan S. Abel, Julius 0. Smith, “Restoring a Clipped Signal,” Acoustic, Speech, and Signal Processing, 1991. ICASSP-91, PP. 1745-1748, Apr. 1991
[9] D. Reynolds and R. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE Tran. on Speech and Audio Prcessing, PP.72-83, 1995.
[10] D. Reynolds and T. Quatieri, “Speaker Verification Using Adapted Gaussian Mixture Models,” Digital Signal Processing 10, PP. 19-41, 2000.
[11] Javier Ramírez, José C. Segura, “Statistical Voice Activity Detection Using a Multiple Observation Likelihood Ratio Test,” IEEE Signal Processing Letter, Vol. 12, NO. 10, PP. 689, October 2005
[12] SPIRIT CORP, “Voice Activity Detector (VAD) Algorithm User’s Guide,” Texas Instrument, p2-1 – 2-3, March 2003


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top