跳到主要內容

臺灣博碩士論文加值系統

(3.90.139.113) 您好!臺灣時間:2022/01/16 18:22
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林榮三
研究生(外文):Rong- San Lin
論文名稱:語音編碼器複雜度之簡化與實現
論文名稱(外文):Complexity Reduction and Implementation of Speech Coders
指導教授:楊家輝楊家輝引用關係
指導教授(外文):Jar-Ferr Yang
學位類別:博士
校院名稱:國立成功大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:107
中文關鍵詞:快速演算法時變基週增益預估演算法殘值轉換DCT二位元向量搜尋演算可變速率聲門激發脈衝
外文關鍵詞:fast algorithmstime-varied pitch gain predictor algorithmresidual transform-based DCTsuccessive bit-vector search algorithmVBRTglottal pulses
相關次數:
  • 被引用被引用:2
  • 點閱點閱:582
  • 評分評分:
  • 下載下載:130
  • 收藏至我的研究室書目清單書目收藏:3
本論文提出數個快速演算法來改善現有語音編碼器之計算複雜度及語音品質。首先,我們提出一階時變基週增益預估演算法來改善原來G.723.1 5階的基週預估器之計算複雜度和為了適用於現在網際網路和無線通訊之變化的頻寬,我們提出一簡單的殘值轉換DCT可變速率語音編碼器,其有小量的隨機碼本編碼複雜度。另外提出一循序二位元向量搜尋演算法來減少碼本搜尋計算複雜度,其次為了改善傳統多脈衝最大近似線性預測編碼器使用一小串不連續的脈衝當激發訊號之缺點而提出新的一組聲門激發脈衝,來改善合成語音品質。
一般比較高階的線性基週預估濾波器,對語音訊號的預估可以達到比較好,但是相對需要較大的計算量所以造成無法及時實現,為了減少計算量,我們提出一階時變基週增益預估演算法來改善,藉以隨時間線性變化的基週增益來取代傳統在一個音框內採用固定增益的技巧。由於隨時在變化的語音,本方法可以追蹤語音訊號在暫態及穩態週期振幅的變化,實驗結果證明所提一階時變基週增益預估方法可以得到的語音品質很接近,使用5階基週預估器的G.723.1編解碼器合成的語音品質,但是使用本方法可以節省99.76%以上的計算量。另外我們提出一VBRT利用DCT係數來編碼殘值訊號,其比CELP所需計算量較少,且語音品質下降在一合理程度,VBRT另一重要特色是很容易建構一可變速率語音編解器達到服務品質可變的通訊,而且此架構僅僅只有7.5ms的延遲而 一般CELP需延遲達30ms,所以VBRT具有低延遲和可變速率的特性,故可適用於多變化頻寬之網際網路和無線通訊系統。為減化碼本搜尋計算複雜度以利編碼器即時實現我們提出循序二位元向量搜尋演算法,是將每個碼向量分解為數個二位元向量,對於二位元向量之迴旋處理我們即可簡成加法運算,以達大量降低計算量。理論分析和實驗結果顯示我們所提出的方法可以減少碼本搜尋計算複雜度,同時維持合理水準的語音品質。
最後為了改善傳統多脈衝最大近似的線性預測編碼器的合成語音品質,我們提出新的一組聲門激發脈衝取代每一個單一脈衝,如此可以模擬更接近自然的激發訊號,達到比較好的效果,一序列的正式訊號雜訊比SEGSNR測試顯示本方法比傳統(MP-MLQ)可以合成比較好的語音品質。
In this thesis, we proposed several fast algorithms to improve the computational complexity and speech quality of current speech coders. First, a time-varied pitch gain predictor algorithm was proposed to reduce computational complexity of the fifth order pitch predictor in G.723.1. Then, the VBRT transform-based vocoder was proposed to reduce encoding complexity of stochastic codebook and to adapt the variation of bandwidth for modern Internet or wireless communications. And we propose a successive bit-vector search approach to reduce the computational complexity of codebook search for coded excited linear predictive coding (CELP) speech coders. Finally, a new glottal pulses mode was proposed to improve speech quality in multi-pulse maximum likelihood quantization (MP-MLQ).
Generally, the higher order of the LTP filters, the better the prediction of speech can be achieved. However, the computational complexity becomes more serious in real-time implementation for higher order LTP filters. For further computational complexity reduction, we propose a time-varied pitch gain predictor algorithm. Due to the non-stationary variation of speech, the proposed LTP method can trace the detailed fluctuation of speech amplitude in both transient and stationary periods. Simulation results show that the proposed method obtains a near speech quality as the fifth-order pitch predictor of the G.723.1 coder does; however, the proposed method can save above the 99.76% computational complexity.
The proposed VBRT transform-based vocoder, which employs DCT coefficients to encode residual signal requires less computational complexity with slightly worse quality than the CELP coders. The other important feature of the proposed VBRT coder is that its structure can be easily configured as a variable bit-rate coder to achieve variable quality-of-service communications. Furthermore, the proposed coding structure with only a 7.5 ms delay is much lower than the CELP coders. The VBRT has low delay and variable bit-rate features, so it has advantage to adapt the variation of bandwidth for modern Internet or wireless communications.
In the CELP encoder, the computation is dominated by both adaptive and stochastic codebook searches since they should be perform convolution process in closed-loop fashions. We propose a successive bit-vector search approach, which decompose each codevector into several bit-vectors, and the convolution process of the linear predictive coding model can be simplified. Theoretical analyses and simulation results show that the computational complexity in codebook searches can be greatly reduced with a negligible degradation of speech quality.
The traditional MP-MLQ LPC vocoders use a small set of discontinuous pulses, which are expressed by amplitudes located with non-uniformly spaced intervals, to represent the excitation signal. A new glottal pulses mode was proposed to improve speech quality in MP-MLQ. A series of formal SNR and SEGSNR tests have showed that it can get the higher quality than the traditional MP-MLQ model.
1. Introduction 1
1.1 Background 1
1.2 Low Bit rate Speech Coding Standards 3
1.3 Organization of Thesis 6
2. Low Bit Rate Speech Coding Algorithms 9
2.1 MPEG Low Bit rate Speech Coders 9
2.2 HVXC of MPEG-4 Standard 10
2.3 Analysis-by-Synthesis Coding ¾ CELP Model 12
2.4 G.723.1 high-rate speech code algorithm 14
2.4.1 Linear Predictive coefficients quantization 17
2.4.2 LSP decoder and interpolation 19
2.4.3 Formant perceptual weighting filter and Harmonic noise shaping 20
2.4.4 Impulse response calculator 22
2.4.5 Adaptive Codebook Search Algorithm 22
2.4.6 MP-MLQ codebook search 24
3. Time-Varied Pitch Gain Predictor Algorithms 26
3.1 Overview of LTP filters with Speech Coder 26
3.2 Pitch and Linear Predictive Filter 27
3.3 Time-Varied Gain Pitch Predictor 31
3.4 Experiment Results 38
3.5 Conclusions 41
4. Transform-Based Variable Bit rate Speech Vocoders Implementation 43
4.1 Overview of CELP 43
4.2 DCT-based Speech Vocoders 45
4.3 DCT-based baseline Vocoder 46
4.3.1 Simplified Two-Tap Adaptive Predictor 50
4.3.2 DCT Transformation 53
4.3.3 Perceptual Selections of DCT Coefficients 55
4.3.4 Perceptual Vector Quantization of DCT Magnitudes 57
4.3.5 Excitation Postfilter in the Decoder 58
4.3.6 High Rate Coders 59
4.4 Computation Analysis and Performance Evaluation 64
4.5 Conclusions 68
5. Successive Bit-Vector Search Algorithm for CELP Vocoders 70
5.1.Overview of CELP Speech Coder 70
5.2. CELP Coder Structure 72
5.2.1 Adaptive Codebook Search 73
5.2.2 Stochastic Codebook Search 73
5.2.3. General Excitation Codebook Search 74
5.3 Successive Bit-Vector Codebook Search 75
5.4. Computation Analyses and Simulation Results 81
5.5 Conclusions 86
6. Improved Multi- Pulse LPC Vocoders With Glottal Pulse Excitation 88
6.1 Overview of MPLPC/ITU G.723.1 88
6.2 Multipulse Excitation Model 89
6.3 Glottal Pulse Excitation Model 92
6.4 Simulation Results 93
6.5 Diminutive Conclusions 95
7. Conclusions 96
Bibliography 99
Publication List 107
[1] M. Hamdi, O. Verscheure, J. P. Hubaux, I. Dalgic, P. Wang, “Voice Service Inter-working for PSTN and IP networks”, IEEE Communications Magazine, Volume: 37 5, Page(s): 104 —111, May 1999.
[2] S. M. Lee, S. Park and Y. Jang, “Cost-effective implementation of ITU-T G.723.1 on a DSP chip”, Proceedings of 1997 IEEE International Symposium on Consumer Elec-tronics, pp.31—34, 1997.
[3] J. D. Markel and A. H. Gray, Linear Prediction of Speech, New York: Springer-Verlag, 1976.
[4] M. R. Schroeder and B. S. Atal, “Code-excited linear prediction (CELP): High quality speech at very low bit rate”, IEEE Proc. of ICASSP-85, pp. 937-940, 1985.
[5] C. R. Galand, J. E. Menez and M. M. Rosso, “Adaptive code excited predictive cod-ing”, IEEE Trans. on Signal Processing, vol. 40, no. 6, pp. 1317-1326, June 1992.
[6] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Englewood Cliffs, N.J: Prentice Hall 1978.
[7] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Norwell, MA: Kluwer, 1992.
[8] S. Singhal, B. S. Atal, “Amplitude optimization and pitch prediction in multipulse coders”, IEEE Trans. on Acoust., Speech, Signal Processing, vol. 37 Issue 3, pp. 317 —327, March 1989.
[9] Z.Wang, ” Predictive fractal interpolation mapping: differential speech coding at low bit rates”, ICASSP-96, vol. 1, pp. 251 —254,1996.
[10] W. B. Kleijn, R. P. Ramachandran and P. Kroon, ”Interpolation of the pitch-predictor parameters in analysis-by-synthesis speech coders ”, IEEE Trans. on Speech and Au-dio Processing, vol. 2 Issue 1 part 2, pp. 42 —54, Jan. 1994.
[11] J. S. Marques, I. M. Trancoso, J. M. Tribolet, and L. B. Almeida, ”Improved pitch prediction with fractional delays in CELP coding”, ICASSP-90, vol.2 pp.665-668.
[12] International Telecommunication Union, “Coding of Speech at 8 kbit/s using Conju-gate-structure Algebraic-code-excited Linear Prediction (CS-ACELP)” ITU-T Rec-ommendation G.729, March 1996.
[13] Y. Linde, A. Buzo, and R. M. Gray, “An algorithm for vector quantizer design”, IEEE Trans. Commun., vol. COM-28, no. 1,pp.84-95, January 1980.
[14] International Telecommunication Union, “Dual Rate Speech Coder for Multimedia Communications at 5.3 and 6.3 kbit/s“, ITU-T Recommendation G.723.1, March 1996.
[15] S. McClellan ,J. D. Gibson and B. K. Rutherford, “Efficient pitch filter encoding for variable rate speech processing ”, IEEE Trans. on Acoust., Speech, Signal Processing, vol. 7, no. 1, pp. 18-29, January 1999.
[16] L. M. da Silva, A. Alcaim “A modified CELP model computationally efficient adap-tive codebook search,” IEEE Signal Processing Letters, vol. 2, no. 3, pp. 44-45, March 1995.
[17] R. P. Ramachandran and P. Kabal, “Pitch prediction filter in speech coding”, IEEE Trans. on Acoust., Speech, Signal Processing, vol. 37, no. 4, pp. 467-478, April 1989.
[18] P. Kroon and B. S. Atal, “Pitch predictors with high temporal resolution”, IEEE Proc., pp. 661-664, 1990.
[19] P. Kroon and B. S. Atal, “On the use of pitch predictors with high temporal resolution”, IEEE Trans. on Signal Processing, vol. 39. no. 3, pp. 733-735, March 1991.
[20] Q. Yasheng, P. Kabal, “Pseudo-three-tap pitch prediction filters”, Acoustics, Speech, and Signal Processing, ICASSP-93, vol. 2, pp. 523 —526.
[21] H. Chen, W. C. Wong and C. C. Ko, “Comparison of pitch prediction and adaptation algorithms in forward and backward adaptive CELP Systems”, IEE Proc. vol. 140, no.4, pp. 240-245, August 1993.
[22] I. Lee, J. D. Gibson, ” Robust backward adaptive pitch prediction for speech coder”, Electronics Letters, Vol. 31, Issue 7, pp. 536 —538, March 1995.
[23] A. Gersho, “Advances in speech and audio compression,” Proc. IEEE, vol. 82, no. 6, pp. 900-918, June 1994.
[24] A. S. Spanias, “Speech Coding: A tutorial review,” Proc. IEEE, vol. 82, no. 10, pp. 1541-1582, Oct. 1994.
[25] J. Makhoul, “Linear Prediction: A tutorial review,” Proc. IEEE, vol. 63, no. 4, pp. 561-580, Apr. 1975.
[26] I. M. Trancoso and B. S. Atal, “Efficient search procedures for selecting the optimum innovation in stochastic coders,” IEEE Trans. on Acoust., Speech, Signal Processing, vol. 38, no. 3, pp. 385-396, Mar. 1990.
[27] S. V. Vaseghi, “Finite state CELP for variable rate speech coding,” IEEE Proc. of ICASSP-90, pp. 37-40, 1990.
[28] M. E. Ahmed and M. I. Al-Suwaiyel, “Fast methods for code search in CELP,” IEEE Trans. on Speech and Audio Processing, vol. 1, no. 3, pp. 315-325, July 1993.
[29] W. B. Kleijn, D. J. Krasinski, and R. H. Ketchum, “Fast methods for the CELP speech coding algorithm,” IEEE Trans. on Acoust., Speech, Signal Processing, vol. 38, no. 8, pp. 1330-1342, Aug. 1990.
[30] International Telecommunication Union, “Coding of Speech at 16 kbit/s using low-delay code excited linear prediction“, CCITT/ITU-T Recommendation G.728, Sept. 1992.
[31] European Telecommunication Standard, "Digital cellular telecommunications system (Phase 2+); Half rate speech; Half rate speech transcoding", GSM 06.02, ETS 300 966, May 1998.
[32] W. Kou and J. W. Mark, “Vector-adaptive vector quantization with application to speech coding,” IEEE Trans. on Commun., vol. 39, no. 6, pp. 958-962, June 1991.
[33] S. V. Vaseghi, “Finite state CELP for variable rate speech coding,” IEE Proceeding-1, vol. 138, no. 6,pp.603-610, December 1991.
[34] F. Beritelli, “A new variable rate speech coder based on fuzzy phonetic classification and CS-ACELP structure”, Proceedings of Fourth International Conference on Signal Processing, 1998. ICSP ''98, vol. 1, 1998, pp. 587 —590.
[35] A. Das, A. Gersho, ”A variable-rate natural-quality parametric speech coder” IEEE Communications, 1994. ICC ''94, SUPERCOMM/ICC ''94, vol.1, pp.216-220, 1994.
[36] W. S. Chung, S. W. Kang, H. S. Sung, J. W. Kim, and S. I. Choi, “Design of a vari-able rate algorithm for the 8 kb/s CS-ACELP coder”, 48th IEEE Proc. of Vehicular Technology Conference, vol. 3, 1998, pp. 2378 —2382.
[37] F. Beritelli, “A modified CS-ACELP algorithm for variable-rate speech coding robust in noisy environments”, IEEE Signal Processing Letters, vol. 6, no. 2, Feb. 1999, pp. 31 —34.
[38] R. Zelinski and P. Noll, “Adaptive transform coding of speech signals,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-25, no. 4, pp. 299-309, Apr. 1977.
[39] J. M. Tribolet and R. E. Crochiere, “Frequency domain coding of speech,” IEEE Trans. on Acoust., Speech, Signal Processing, vol. ASSP-27, no. 5, pp. 512-530, Oct. 1979.
[40] Y. Shoham,“ High-quality speech coding at 2.4 to 4.0 kbps based on time-frequency interpolation,” IEEE Proc. ICASSP-93, Vol. II, pp.167-169, 1993.
[41] ISO/IEC JTCI/SC29, Information technology - coding of moving pictures and asso-ciated audio for digital storage media at up to about 1.5mbps-cd 11172-3 (part3,audio), 1993.
[42] ISO/IEC JTCI/SC29, Information technology-coding of moving pictures and associ-ated audio — Part 3: Audio, ISO/IEC DIS 13818-3, 1995.
[43] O. Ghitza, “Auditory models and human performance in tasks related to speech cod-ing and speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 1, pp. 115-132, Jan. 1994.
[44] E. Zwicker and H. Fastl, Psychoacoustics─Facts and Models, Springer-Verlag Berlin Heidelberg, Printed in Germany, 1990.
[45] A. A. Azirani , R. Lee, B. Jeannes, and G. Faucon, “Optimizing speech enhancement by exploiting masking properties of human ear,” IEEE Proc. of ICASSP-95, pp.800-803, 1995.
[46] N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Trans. on Computers, vol. C-23, no. 1, pp. 90-93, Jan. 1974.
[47] B. G. Lee, “A new algorithm to compute the discrete cosine transform,” IEEE Trans. on Acoust., Speech, Signal Processing, vol. ASSP-32, no. 6, pp. 1243-1245, Dec. 1984.
[48] M. T. Sun, L. Wu, and M. L. Liou, "A Concuurrent Architecture for VLSI Imple-mentation of Discrete Cosine Transform", IEEE Trans. on Circuits and Systems, vol.34, pp.992-994, Aug.1987
[49] N. I. Cho, and S. U. Lee, "DCT algorithms for VLSI parallel implementation", IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 38, pp.121-127, Jan. 1990.
[50] W. Li, "A new algorithm to compute the DCT and its inverse," IEEE Trans. Signal Processing, vol. 39, no. 6, pp.1035-1313, June 1991.
[51] N. Kitawaki, M. Honda, and K. Itoh, ”Speech-quality assessment methods for speech-coding system”, IEEE Commun. Mag., vol. 22, pp. 26-33, 1984.
[52] B.S. Atal and J. Remde. “A new model of LPC excitation for producing natural sounding speech at low bit rates ”. Proc. of ICASSP, pp 614-617, 1982.
[53] P. Kroon, E. Deprettere, and R. Sluyter. “Regular-pulse excitation: A novel approach to effective and efficient multipulse coding of speech”. IEEE Trans. on ASSP, Octo-ber 1986, pp. 1054-1063.
[54] K. Ozawa, S. Ono, and T. Araseki, “A Study on Pulse Search Algorithms for Mul-tipulse Excited Speech Coder Realization”, IEEE Journal on Selected Areas in Com-munications, vol. 4, no. 1, Jan. 1986.
[55] Atal, B. S., 1986. “High quality speech at low rates: Multipulse and stochastically ex-cited linear predictive coders”. IEEE Proc.Int. Conf. Acoust., Speech, Signal Process-ing, Tokyo, Japan, 1681-1684.
[56] Atal, B. S. and Schroeder, M. R., 1984. “Stochastic coding of speech signals at very low bit rates”. Int. Conf. Commun.- ICC’84, Amsterdam, The Netherlands, Part 2, 1610-1613.
[57] Davidson, G. and Gersho, A., 1986. “Complexity reduction methods for vector excita-tion coding”. IEEE Int. Conf. Acoust., Speech, Signal Processing, Tokyo, Japan, 3055-3058.
[58] NCS (National Communications System) Technical Information Bulletin FS-1016 CELP Speech Coding at 4800 bps.
[59] Grieder, W., Langi, A. and Kinsner, W., 1993. “Codebook searching for 4.8 kbps CELP speech coder”. IEEE Int. Conf. Acoust., Speech, Signal Processing, 397-406.
[60] Moreau, N. and Dymarski, P., 1994. “Selection of excitation vectors for the CELP coders”. IEEE Trans. on Speech and Audio Processing, Vol. 2, No.1, January, 29-41.
[61] Suddle, M.R., Kondoz, A. M. and Evans, B. G., 1991. “DSP Implementation of low bit-rate CELP based speech coders”. Digital Processing of Signals in Communication Sixth International Conference on IEE, 309-314.
[62] Kurihara, A. S., Hayashi, S. and Moriya, T., 1996. “Improved CELP-based coding in a noisy environment using a trained sparse conjugate codebook”. IEICE Trans. Inf. & Syst., Vol. E79-D, No. 2, February, 123-129.
[63] Gerson, I. A. and Jasiuk, M. A., 1991. “Vector sum excited linear prediction (VSELP)”. IEEE Trans. on Signal Processing, Vol. 39, No. 7, July, 1503-1515.
[64] Kataoka, A., Kurihara, S. and Hayashi, S., 1997. “A 6.4-kbit/s Variable-Bit-Rate Ex-tension to the G.729 (CS-ACELP) Speech Coder”. IEICE Trans. Inf. & Syst., Vol. E80-D, No.12, December, 1183-1189.
[65] Adoul, J. P., Mabilleau, P., Delprat, M. and Morissette, S., 1987. “Fast CELP coding based on algebraic codes”. IEEE Int. Conf. on Acoust., Speech and Signal Processing, 1957-1960.
[66] Choi, J., 1995. “A Fast Determination of Stochastic Excitation without Codebook Search in CELP Coder”. IEEE Trans. on Speech and Audio Processing, Vol. 3, No.6, November, 473-480.
[67] Malone, K. T. and Fischer, T. R., 1993. “Trellis-searched adaptive predictive coding of speech”. IEEE Trans. on Speech and Audio Processing, Vol. 1, No.2, April, 196-205.
[68] Chang, D.-I., Cho. Y.-K. and Ann, S., 1995. “Efficient quantization of LSF parameters using classified SVQ combined with conditional splitting”. ICASSP-95, 736-739.
[69] Paliwal, K. K., Atal, B. S., 1993. “Efficient vector quantization of LPC parameters at 24 bits/frame”. IEEE Trans. on Speech and Audio Processing, Vol.1, No.1, January, 3-14.
[70] Kim, H. K., Lee, H. S., 1999. “A Low-Bit-Rate Extension Algorithm to the 8 kbit/s CS-ACELP Based on Adaptive Fixed Codebook Modeling”. IEICE Trans. Inf. & Syst., Vol. E82-D, No.7, July, 1087-1092.
[71] Nakhai, M., Marvasti, F., 1999. “A Hybrid Speech Coder Based on CELP and Sinu-soidal Coding”. IEICE Trans. Inf. & Syst., Vol. E82-D, No.8, August, 1190-1199.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top