研究生(外文):Wei-yi Tsai
論文名稱(外文):Design of Temporal Filters Based on Modulation Spectrum for Robust Speech Recognition
指導教授(外文):Jeih-weih Hung
外文關鍵詞:Robust speech recognitiontemporal filtermodulation spectra
本篇論文主旨,在於語音辨認的系統中改善特徵參數所使用之時間序列濾波器,以進一步提升語音辨識的強健性,在過去資料導向的時間序列濾波器藉由一些最佳化準則的使用已經被證實有能力提升語音辨識系統在雜訊環境下的辨識率。然而這些濾波器的設計通常是根據特徵參數在時域上的統計特性,而在本文提出三種新的資料導向的時間序列濾波器,主要是使用特徵參數在調變頻域上的統計特性,包括受限之主軸成分分析法(Constrained -Principal Component Analysis,C-PCA)、受限之線性鑑別分析法(Constrained- Linear Discriminant Analysis,C-LDA )以及受限之最大分類距離法(Constrained- Maximum Class Distance,C-MCD);直接透過調變頻域上的統計特性來求得最佳化的時間濾波器序列之係數,同時我們也進一步將這些新技術與傳統倒頻譜正規化方法做更進一步的結合,包含倒頻譜平均與變異數正規化法(Cepstral Mean and Variance Normalization,CMVN)以及倒頻增益正規化法(Cepstral Gain Normalization,CGN)。本論文之辨識實驗皆使用國際通用的Aurora 2.0數字語音資料庫;由初步實驗結果顯示我們所提出來之三種新的時間序列濾波器技術,可以顯著提升語音辨識其正確率,不管在各種雜訊環境下皆有良好的表現,當使用於梅爾倒頻譜特徵係數(Mel-Frequency Cepstral Coefficients,MFCC)時,並且顯示與倒頻譜平均與變異數正規化法和倒頻增益正規化法有加成的效果,得以進一步提升辨識的正確率。
The computer and its related products have become a necessity in the modern life,and some of their common features are that they are often small in size,light in weight,and even invisible。As a result,the traditional man-machine interfaces,such as keyboard and mouse,are not convenient any longer。On the other hand,voice can be a very natural and efficient tool for people to communicate with these new equipments,with well-developed speech recognition techniques,it is no longer a dream for us to “talk” with machines。
However,the performance of a speech recognizer is often limited by its application environment。For example,the background noise and the channel effect often degrades the recognition accuracy very seriously。In the past,tremendous approaches by researchers have been proposed to enhance the recognizer’s performance under an adverse environment。In this thesis,we focus on developing new temporal filtering techniques for speech features in order to improve their robustness in noisy speech recognition。
The new proposed temporal filters are based on the statistical information of the modulation spectrum for speech features。They are derived according to constrainted versions of Principal Component Analysis(PCA)、Linear Discriminant Analysis(LDA)and Maximum Class Distance(MCD),respectively。
The result of a series of experiments conducted on Aurora 2.0 database show that the proposed temporal filters effectively enhance the recognition performance under noisy environments and they can be integrated with other temporal filtering approaches,Cepstral Mean and Variance Normalization(CMVN) and Cepstral Gain Normalization(CGN),to provide further improvements。
