研究生(外文):Sheng-Fu Wang
論文名稱(外文):Automatic Segmentation and Identification of Mixed-Language Speech Using delta-BIC and Support Vector Machines
指導教授(外文):Chia-Ping Chen
外文關鍵詞:LIDdelta-BICSegmentationSupport Vector Machines
自動語言辨識可分成四個步驟:特徵參數擷取、分段、片段分類、與重新標註。特徵參數擷取的部份,我們比較群延遲特徵 (group delay feature, GDF) 和傳統梅爾頻率倒頻譜參數 (Mel-frequency cepstral coefficient, MFCC) 兩種不同的特徵參數。不同於傳統特徵參數取自於傅立葉轉換後的強度,群延遲特徵使用相位頻譜。在語言分段的部份,我們比較差分貝氏資訊準則 (delta-Bayesian in-formation criterion, delta-BIC) 與支援向量機 (support vector machines, SVMs) 等兩種不同方法。差分貝氏資訊準則使用聲學參數,用於將輸入語句切割成一連串語言相依的片段。再使用 K-平均演算法 (the K-means algorithm) 進行分群。最後,重新標註用於辨識各分群的語言。支援向量機則在完成訓練模型後,直接進行自動語言分段及辨識。
考慮腔調可能產生的影響,我們使用台灣口音英語 (English Across Taiwan) 語料庫。在基礎為 57.77% 的音框正確率,可以得到 78.13% 的結果。
This thesis proposes an approach to segmenting and identifying mixed-language speech.
Automatic LID can be divided into four steps, feature extraction, segmentation, segment clustering, and re-labeling. In feature extraction, we compare the group delay feature (GDF) with MFCC feature. Unlike the traditional feature from Fourier trans-form magnitude, GDF uses the phase spectrum. In segmentation, we compare delta Bayesian information criterion (delta-BIC) with support vector machines (SVMs). A delta-BIC is applied to segment the input speech utterance into a sequence of lan-guage-dependent segments using acoustic features. The segments are clustered using the K-means algorithm. Finally, re-labeling is used to determine the language of the clusters. SVMs proceed to segment and identify automatically after model training.
Considering the effect of the accent issue, we use the corpus English Across Taiwan (EAT) to perform our system. The experimental results show that the system can reach 78.13% in the frame hit rate under the baseline 57.77%.
