跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/06 19:29
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:王聖富
研究生(外文):Sheng-Fu Wang
論文名稱:使用差分貝氏資訊準則及支援向量機於混合語言語音自動分段與辨識
論文名稱(外文):Automatic Segmentation and Identification of Mixed-Language Speech Using delta-BIC and Support Vector Machines
指導教授:陳嘉平陳嘉平引用關係
指導教授(外文):Chia-Ping Chen
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:英文
論文頁數:66
中文關鍵詞:辨識差分貝氏資訊準則分段支援向量機
外文關鍵詞:LIDdelta-BICSegmentationSupport Vector Machines
相關次數:
  • 被引用被引用:0
  • 點閱點閱:215
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
這篇論文提出方法,用來分段及辨識混合語言的語音資料。
自動語言辨識可分成四個步驟:特徵參數擷取、分段、片段分類、與重新標註。特徵參數擷取的部份,我們比較群延遲特徵 (group delay feature, GDF) 和傳統梅爾頻率倒頻譜參數 (Mel-frequency cepstral coefficient, MFCC) 兩種不同的特徵參數。不同於傳統特徵參數取自於傅立葉轉換後的強度,群延遲特徵使用相位頻譜。在語言分段的部份,我們比較差分貝氏資訊準則 (delta-Bayesian in-formation criterion, delta-BIC) 與支援向量機 (support vector machines, SVMs) 等兩種不同方法。差分貝氏資訊準則使用聲學參數,用於將輸入語句切割成一連串語言相依的片段。再使用 K-平均演算法 (the K-means algorithm) 進行分群。最後,重新標註用於辨識各分群的語言。支援向量機則在完成訓練模型後,直接進行自動語言分段及辨識。
考慮腔調可能產生的影響,我們使用台灣口音英語 (English Across Taiwan) 語料庫。在基礎為 57.77% 的音框正確率,可以得到 78.13% 的結果。
This thesis proposes an approach to segmenting and identifying mixed-language speech.
Automatic LID can be divided into four steps, feature extraction, segmentation, segment clustering, and re-labeling. In feature extraction, we compare the group delay feature (GDF) with MFCC feature. Unlike the traditional feature from Fourier trans-form magnitude, GDF uses the phase spectrum. In segmentation, we compare delta Bayesian information criterion (delta-BIC) with support vector machines (SVMs). A delta-BIC is applied to segment the input speech utterance into a sequence of lan-guage-dependent segments using acoustic features. The segments are clustered using the K-means algorithm. Finally, re-labeling is used to determine the language of the clusters. SVMs proceed to segment and identify automatically after model training.
Considering the effect of the accent issue, we use the corpus English Across Taiwan (EAT) to perform our system. The experimental results show that the system can reach 78.13% in the frame hit rate under the baseline 57.77%.
中文摘要 …………………………………………………………………………… i
Abstract …………………………………………………………………………… ii
誌謝 …………………………………………………………………………… iii
Table of Contents …………………………………………………………………………… iv
List of Tables …………………………………………………………………………… vii
List of Figures …………………………………………………………………………… viii
1 Introduction ……………………………………………………………………… 1
1.1 Background ………………………………………………………………… 1
1.2 Motivation ………………………………………………………………… 2
1.3 Purposes …………………………………………………………………… 3
1.4 Thesis Organization ……………………………………………………… 3
2 Review …………………………………………………………………………… 5
2.1 Mono-lingual LID ………………………………………………………… 6
2.1.1 Acoustic Features …………………………………………………… 6
2.1.2 Prosody Features …………………………………………………… 7
2.1.3 Phonotactics ………………………………………………………… 9
2.1.4 Acoustic Model ……………………………………………………… 10
2.2 Mixed-language LID ……………………………………………………… 11
2.2.1 Methods for Segmentation ………………………………………… 11
2.2.2 Classifier …………………………………………………………… 13
3 Methods ………………………………………………………………………… 15
3.1 System I …………………………………………………………………… 15
3.1.1 Feature Extraction …………………………………………………… 16
3.1.2 Segmentation ………………………………………………………… 21
3.1.3 Segment Clustering …………………………………………………… 26
3.1.4 Re-label ……………………………………………………………… 27
3.2 System II …………………………………………………………………… 29
3.2.1 Types of SVMs ………………………………………………………33
3.2.2 Kernel Function ………………………………………………………34
3.2.3 Probability Estimates ………………………………………………… 35
3.3 System III ………………………………………………………………… 36
3.3.1 Shifted Delta Cepstrum …………………………………………… 38
4 Experimental Results …………………………………………………………… 40
4.1 System I …………………………………………………………………… 43
4.2 System II …………………………………………………………………… 45
4.3 System III ………………………………………………………………… 46
5 Conclusions and Future work …………………………………………………… 49
Reference …………………………………………………………………………… 51
1. A. Waibel, P. Geutner, and L. M. Tomokiyo, “Multilinguality in speech and spoken language systems,” in Proc. IEEE, vol. 88, pp. 1297 - 1313, 2000.
2.Y. Muthusamy, E. Barnard, and R. Cole, “Reviewing automatic language identi-fication,” Signal Processing Magazine, IEEE, vol. 11, pp. 33 - 41, Oct. 1994.
3.P. Yip and R. K. R., Discrete Cosine Transform: Algorithms, Advantages and Ap-plications. Norwell, MA: Academic, 1997.
4.P. Mermelstein, “Distance measures for speech recognition, psychological and instrumental,” in Pattern Recognition and Artificial Intelligence, pp. 374 - 388, 1976.
5.B. Gold and N. Morgan, Speech and Audio Signal Processing. John Wiley & Sons, Inc., 2000.
6.H. Hermansky, “Perceptual linear predictive (plp) analysis of speech,” in Journal of Acoustical Society of America, vol. 87, pp. 1738 - 1752, 1990.
7.L. Ferrer, H. Bratt, V. R. R. Gadde, S. Kajarekar, E. Shriberg, K. S. Andreas, and S. A. Venkataraman, “Modeling duration patterns for speaker recognition,” Eu-rospeech, pp. 2017 - 2020, 2003.
8. R. Tong, B. Ma, D. Zhu, H. Li, and E. S. Chng, “Integrating acoustic, prosodic and phonotactic features for spoken language identification,” in Acoustics, 51 Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE In-ternational Conference on, vol. 1, pp. I-205 - I-208, 14-19 May 2006.
9.D. Reynolds, W. Campbell, T. Gleason, C. Quillen, D. Sturim, P. Torres-Carrasquillo, and A. Adami, “The 2004 MIT Lincoln laboratory speaker recognition system,” in ICASSP''05, vol. 1, pp. 177 - 180, March 2005.
10. P. Boersma and D. Weenink, “Praat: doing phonetics by computer,” http://www.praat.org.
11. C.-Y. Lin and H.-C. Wang, Language identification using pitch contour in- for-mation in the ergodic markov model," in Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Con- ference on, vol. 1, pp. I-193 – I-196, 14-19 May 2006.
12.M. Rizvi, B. Akram, M. Anwar, M. Baig, and M. Sheikh, “Language identifica-tion from raw speech,” in Students Conference, ISCON ''02. Proceedings. IEEE, vol. 1, pp. 27 – 33 vol.1, 16-17 Aug. 2002.
13.M. Zissman, “Comparison of four approaches to automatic language identification of telephone speech," Speech and Audio Processing, IEEE Transactions on, vol. 4, p. 31, Jan 1996.
14.M. A. Zissman and E. Singer, “Automatic language identification of telephone speech message using phoneme recognition and n-gram modeling,” ICASSP''94, vol. 1, pp. 305 - 308, Apr. 1994.
15.B. Ma and H. Li, “A phonotactic-semantic paradigm for automatic spoken docu-ment classification,” SIGIR2005, pp. 369 - 376, Aug. 2005.
16.G. McLachlan and T. Krishnan, The EM algorithm and extensions. John Wiley & Sons, 1988.
17.P. A. Torres-Carrasquillo, E. Singer, M. A. Kohler, R. J. Greene, D. A. Reynolds, and J. R. J. Deller, “Approaches to language identification using Gaussian mixture models and shifted delta cepstral features," ICSLP, pp. 89 - 92, Sep 2002.
18.S. Chen and P. Gopalakrishnan, “Speaker, environment and channel change de-tection and clustering via the Bayesian information criterion," in DARPA Speech Recognition Workshop, 1998.
19.H. Akaike, “A new look at the statistical model identification," in Automatic Con-trol, IEEE Transactions on, vol. 19, pp. 716 - 723, 1974.
20.J. Rissanen, “Modeling by shortest data description,” in Automatica, vol. 14, pp. 465 - 471, 1978.
21.B. S. Everitt, The Cambridge Dictionary of Statistics. Cambridge University Press, 1 ed., October 1998.
[22] P. D. GrÄunwald, The Minimum Description Length Principle. 2007.
[23] V. Vapnik, The nature of statistical learning theory. Berlin: Springer-Verlag,1995.
[24] H. Abdi, A neural network primer," in Journal of Biological Systems, vol. 2,1994.
[25] B. V. Dasarathy, Nearest Neighbor (NN) Norms: NN Pattern Classi‾cation Techniques. Ieee Computer Society, 1991.
[26] A. Oppenheim and R. Schafer, Discrete-Time Signal Processing. Upper Sad-dle River, NJ: Prentice-Hall, 2000.
[27] R. M. Hegde, A. Murthy, Hema, and V. R. R. Gadde, Signi‾cance of the modi‾ed group delay feature in speech recognition," Audio, Speech and Lan-guage Processing, IEEE Transactions on, vol. 15, pp. 190{202, Jan. 2007.
[28] J. A. Hartigan, Clustering Algorithms. Wiley, 1975.
[29] R. Fletcher, Optimization in Practice. John Wiley, 1987.
[30] R. E. Fan, P. H. Chen, and C. J. Lin, Working set selection using second order information for training support vector machines," in The Journal of Machine Learning Research, vol. 6, pp. 1889 { 1918, December 2005.
[31] B. E. Boser, I. Guyon, and V. Vapnik, A training algorithm for optimal margin classi‾ers," in Computational Learning Theory, pp. 144{152, 1992.
[32] B. SchÄolkopf, A. Smola, R. C. Williamson, and P. L. Bartlett, New support vector algorithms," in Neural Computation, vol. 12, pp. 1207{1245, 2000.
[33] C. J. C. Burges, A tutorial on support vector machines for pattern recogni-tion," in Data Mining and Knowledge Discovery, pp. 121{167, 1998.
[34] B. SchÄolkopf, K. Sung, C. J. C. Burges, F. Girosi, P. Niyogi, T. Poggio, and V. Vapnik, Comparing support vector machines with gaussian kernels to radial basis function classi‾ers," in Signal Processing, vol. 45, pp. 2758{2765,November 1997.
[35] G. Wahba, Support vector machines, reproducing kernel hilbert spaces, and the randomized gacv," in Advances in Kernel Methods: Support Vector Learn-ing (B. SchÄoelkopf, C. J. C. Burges, and A. J. Smola, eds.), pp. 69{87, MIT Press, 1999.
[36] V. Vapnik, Statistical learning Theory. Wiley, New York, 1998.
[37] T. Hastie and R. Tibshirani, Classi‾cation by pairwise coupling," in Ad-vances in Neural Information Processing Systems, 1998.
[38] B. Bielefeld, Language identi‾cation using shifted cepstrum," In 14th Annual Speech Research Symposium, 1994.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top