|
MFCC倒頻譜係數是語音特徵參數的一種,主要從人類聽覺的物理特性所 得來,非常能夠代表語音,除了計算方式較LPCC倒頻譜參數更加直接外, 也有相當不錯的辨識率。此外語音辨識近來則以隱藏式馬可夫模型(HMM )為主要架構,因為在許多相關應用方面都有相當不錯的效果,所以使用 該模型作為參數求取後的語音辨識驗證。因此我們希望將MFCC演算法以硬 體實現,並做為未來整個辨識系統中參數計算的模組。 本論文首先詳細 分析整個原始MFCC演算法所需計算量,使用簡化過之餘弦查表方法( simplified cosine table-lookup method)降低一半的記憶體,也將乘 法運算減少為原來二分之一;其次,利用Mel 頻率座標轉換之特性,將加 權能量頻譜所需之相關乘法運算與記憶體同時降為原來一半;最後更採用 修改後之分割式查表法(modified partitioned logarithm table look- up method )計算對數,除維持原精確度以外,更同時減少查表過程所需 運算與大幅降低查表所需儲存空間達50%之多。 整個硬體架構依據修改 後之演算法配合TSMC 0.6μm製程之標準元件庫設計完成。晶片面積 為3.2*3.3 mm2,以120支接腳包裝,總閘數約為10000,最高工作頻率為 50MHz,可以充分符合即時語音參數計算之需求。 Mel Frequency Cepstrum Coefficient is one kind of speech feature parameters, derived from the characteristic of human hearing. It isnot only good enough to model human speech but more straightforwardthan LPC cepstrum in calculation and has nice recognition rate. Ourpurpose is to implement the MFCC algorithm in hardware which functionsas the speech feature extraction module in overall recognition system. In our thesis, we first study the original MFCC algorithm in detailand analyze its required computational load. We utilize the simplifiedcosine table-lookup method to reduce the memory requirements and thenumber of multiplication to one half. Secondly, both the multiplication operations and memory size concerning weighted energy spectrum are cut down to one half by taking advantage of mapping between mel scale and frequency scale. Finally, we perform the logarithm operations by means of modified partitioned table look-up method. It has fewer intermediate operations needed by table look-up and dramatically decreases the required table size to 50% of original one with the same accuracy. The chip has been implemented using TSMC 0.6μm CELL Library. The chipsize is 3.2*3.3mm2, it contains 120 I/O PADS and the gate count is about 10000. The maximum working frequency is 50MHz and fully meets the requirementof real-time speech feature calculation.
|