跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.40) 您好!臺灣時間:2026/06/17 02:58
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:李彥澤
研究生(外文):Yan-Tze Lee
論文名稱:線頻譜頻率向量量化器之研究
論文名稱(外文):A Study on Vector Quantization for Line Spectral Frequencies
指導教授:李清坤
指導教授(外文):Ching-Kuen Lee
學位類別:碩士
校院名稱:大同大學
系所名稱:電機工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:48
中文關鍵詞:線頻譜頻率向量量化語音編碼
外文關鍵詞:Line Spectral FrequenciesVector QuantizationSpeech Coding
相關次數:
  • 被引用被引用:0
  • 點閱點閱:184
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在現今大多數語音編解碼器中,均使用線性預估(Linear Prediction, LP)濾波器來模擬人類聲道。通常我們均以線頻譜對(Line Spectral Pair, LSP)或稱線頻譜頻率(Line Spectral Frerquencies, LSF’s)來代表此濾波器的係數在頻域上的表示方法。目前在語音編碼領域裡,用來代表線性預估濾波器參數的最佳選擇之一即是線頻譜頻率參數,在本論文中,我們將會對線頻譜頻率參數做一探討。
而在低位元率語音編碼應用中,如何以較少的位元率來表示此濾波器的參數,也就是線頻譜對,又不致於影響合成語音的品質一直是個很重要的課題。早期的研究顯示經由純量量化達到準確的量化結果約需要34~36位元,在向量量化方面,近期所提出的切割式碼本(Split Codebook)向量量化器需要使用24位元,而美國聯邦標準編號FS-1017 MELP(Mixed Excitation Linear Prediction)語音編解碼器中的4階段向量量化器則需要使用到25個位元。
基於以上的研究,本篇論文的構想是設計新的線頻譜頻率向量量化器架構來達到降低傳輸位元率和減低頻譜誤差的目標,同時也能保持合成語音的品質。在本篇論文中,我們以二階段向量量化(2-Stage Vector Quantization)為基礎,實際設計並驗證了數種不同的向量量化器架構,以期能在不減低合成語音品質的前提下降低所需的傳輸位元率。經由我們的實驗結果,我們提出了兩種效能最佳的向量量化器架構,此兩種架構分別為:
(1) 直接對線頻譜頻率參數做量化。其第一階段的量化過程中,除固定碼本之外,為了充分利用線頻譜頻率在相鄰時框間相當高的關聯性(Correlation),我們另外加上一適應性碼本(Adaptive Codebook)來存放數個先前量化過的線頻譜頻率參數。而在兩階段碼字搜尋的過程中使用了M-Best搜尋法來增進覓得最佳結果的可能性。
(2) 我們的實驗結果顯示,在第一階段向量量化前先將各階線頻譜頻率參數減去一經由實驗得來的線頻譜頻率平均值後,我們可以得到一線頻譜頻率差值,經由觀察發現,相對於原來的線頻譜頻率,此差值具有較低的動態範圍(Dynamic Range)。因此,我們將所得到的差值作為第一階段量化的輸入,而在碼字的搜尋中同樣也使用了M-Best搜尋法來增進覓得最佳結果的可能性。
經由我們的實驗結果得知,此兩種向量量化架構分別只需要在24及22個位元時就
可達到量化透明(Quantization Transparency)中平均頻譜誤差低於1dB的要求,而且效能均比切割式碼本向量量化器來的好。

For the most part speech coding systems, they use linear prediction (LP) filter to model the vocal tract. Generally, we use the line spectral pair (LSP) or line spectral frequencies (LSF’s) to represent the coefficients of the LP filter in the frequency domain. LSF’s parameters are currently one of the best choices for the coefficients of the LP filter. This thesis will discuss LSF’s parameters at some details.
For low bit rate speech coding applications, it is a very important issue to use as few bits as possible to quantize these coefficients. Through earlier research results, using scalar quantizer needs about 34~36 bits/frame to quantize LSF’s accurately. At recently, the split codebook LSF’s vector quantizer and the 4-stage LSF’s vector quantizer adopted in the Federal Standard 1017 mixed excitation linear prediction (MELP) speech coder need 24 and 25 bits/frame, respectively.
Through our research, the basic idea of this thesis is to design new quantization schemes in order to reduce the transmission bit rate and decrease the spectral distortion while maintaining the quality of the synthesized speech. In this thesis, we design and investigate several different kinds of vector quantization schemes, which are all based on the 2-stage vector quantization scheme. We expect that the new vector quantization schemes can use fewer bits to represent the quantized LSF’s while maintaining the quality of the synthesized speech.
Through our experimental results, we proposed two kinds of vector quantization schemes, which have the best performances:
(1) Quantize the LSF’s directly. In addition to the fixed codebook, we add an adaptive codebook, stores the quantized LSF’s of previous analysis frames, to the fixed codebook in the first stage quantization precedure due to the high correlation of the adjacent LSF’s analysis frames. During searching for the best matched codevectors, we also use the M-Best method to increase the possibility of searching for the best matched codevector.
(2) Through our experimental results, the LSF’s is substracted by a set of predefined average LSF’s before first quantization procedure, and a set of LSF’s differences is derived. It is observed that the LSF’s differences have reduced dynamic range compared to the absolute LSF’s themselves. Therefore, we use the differences between the original LSF’s and a set of predefined average LSF’s, which are derived from a LSF’s training sequence, as input of the first stage quantization procedure. Similarly, this quantization scheme also adopts the M-Best method to increase the possibility of searching for the best matched codevector.
Through our experimental results, these two vector quantization schemes just need 24 and 22 bits/frame to achieve quantization transparency, i.e. the spectral distortion is lower than 1 dB, respectively. Experimental results also show that our proposed LSF’s vector quantization schemes outperform the split codebook method.

CONTENTS
Page
ABSTRACT IN CHINESE I
ABTRACT IN ENGLISH III
ACKNOWLEDGEMENTS V
CONTENTS VI
LIST OF FIGURES VIII
LIST OF TABLES IX
CHAPTER 1 INTRODUCTION 1
1.1 Overview of Speech Coding 1
1.2 Research Motivation 3
1.3 Organization of the Thesis 4
CHAPTER 2 FUNDAMENTALS OF LINEAR PREDICTION AND THE CELP CODER 5
2.1 Introduction 5
2.2 Linear Prediction of Speech Signals 5
2.2.1 Solutions to LPC analysis 8
2.2.2 Line Spectral Frequencies (LSF’s) Representation and Properties 9
2.3 Analysis-By-Synthesis Linear Predictive Speech Coding 15
2.4 Code-Excited Linear Predictive (CELP) Vocoder 17
2.4.1 Long-Term Predictive Filter 19
2.4.2 Perceptual Weighting Filter 21
2.4.3 Excitation Codebook 22
CHAPTER 3 THE PROPOSED SYSTEM ACHITECTURE 26
3.1 Overview 26
3.2 The Proposed System Architecture 29
3.2.1 Structures of The First Kinds of Proposed LSF’s Vector Quantizers 29
3.3.2 Structures of The Second Kinds of Proposed LSF’s Vector Quantizers 32
3.3 Distance Measure for Best LSF’s Search 34
CHAPTER 4 SIMULATION RESULTS AND DISCUSSIONS 36
4.1 Speech Quality Assessment 36
4.2 Experimental Results 38
4.2.1 Simulation Results of The First Proposed LSF’s Vector Quantizers 40
4.2.2 Simulation Results of The Second Proposed LSF’s Vector Quantizers 42
4.3 Discussions 43
CHAPTER 5 CONCLUSIONS 45
REFERENCES 47

REFERENCES
[1] B. Atal and J. Remde, “A new model for LPC excitation for producing natural sounding speech at low bit rates,” in Proc. ICASSP’82, pp. 614-617, 1982.
[2] P. Kroon, E. Deprettere, and R. J. Sluyeter, “Regular-pulse excitation-A novel approach to effective and efficient multi-pulse coding of speech,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, no. 5, Oct., 1986.
[3] M. R. Schroeder and B. Atal, “ Code-excited linear prediction (CELP): High quality speech at very low bit rates,” in Proc. ICASSP’85 , pp. 937-940, 1985.
[4] F. Itakura, “Line Spectrum Representation of Linear Predictive Coefficients of Speech Signals,” IEEE J. Acoust. Soc. Am., 57, 535(a), s35(A), 1975.
[5] N. Sugamura, S. Sagayama, and F. Itakura, “A study on speech quality of synthesized speech by LSP,” Trans. Committee on Speech Res., ASJ, vol. H80-29, pp. 229-233, July 1980 (in Japanese).
[6] K. K. Paliwal and B. Atal, “Efficient vector quantization of LPC parameters at 24 bits/frame,” IEEE Trans. Speech, Audio Process., vol. 1, no. 1, Jan. 1993.
[7] J. Y. Huang, A CELP Coder with Vector Qquantized LPC Parameters, Master Thesis, Institute of Communication Engineering, Tatung University, July 2001.
[8] G. Fant, Acoustic Theory of Speech Production. Gravenhage, The Netherlands: Mounton and Co., 1960.
[9] E. Erzin and A. E. Cetin, “Interframe Differential Vector Coding of Line Spectrum Frequencies,” in Proc. ICASSP’93, pp. 25 —28, 1993.
[10] N. Sugamura and N. Farvardin, “Quantizer design in LSP speech analysis-synthesis,” IEEE Journal on Selected Areas in Communications, vol. 6, NO. 2, Feb. 1988.
[11] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Englewood Cliffs, NJ: Prentice-Hall, 1978.
[12] J. Makhoul, “Linear prediction: A tutorial review,” in Proc. IEEE, vol. 63, pp. 124-143, Apr. 1975.
[13] F. Itakura and N. Sugamura, “LSP speech synthesizer, its principle and implementation,” Trans. Committee on Speech Res., ASJ, vol. S79-46, Nov. 1979 (in Japanses).
[14] N. Sugamura and F. Itakura, “Speech analysis and synthesis methods developed at ECL in NTT-From LPC to LSP,” Speech Commun., vol. 5, pp. 199-215, June 1986.
[15] N. Sugamura and F. Itakura, “Speech data compression by LSP speech analysis and synthesis technique,” IECE Trans., vol. J64-A, no. 8, pp. 599-605, Aug. 1981.
[16] A. S. Spanias, “Speech Coding: A Tutorial Review,” in Proc. IEEE, vol. 82, pp. 1541-1582, Oct. 1994.
[17] S. Singhal and B. Atal, ‘Improving the performance of multipulse coders at low bit rates,” in Proc. ICASSP’84, p. 1.3.1, 1984.
[18] J. Campbell, T. Tremain and V. Welch, Advanced in Speech Coding, Boston: Kluwer Academic Publishers, pp. 121-133, 1990.
[19] A. M. Kondoz, Digital Speech: Coding for Low Bit Rate Communication Systems, New York: John Wiley & Sons, 1994.
[20] “Technical Information Bulletin 92-1: Details to assist in implementation of federal standard 1016 CELP,” National Communications System, Jan. 1992.
[21] J. Grass and P. Kabal, “Methods of improving vector-scalar quantization of LPC coefficients,” in Proc. ICASSP’91, pp. 657-660, 1991.
[22] R. Laroia, N. Phamdo, and N. Farvardin, “Robust and efficient quantization of speech LSP parameters using structured vector quantizers,” in Proc. ICASSP’91, pp. 641-644, 1991.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top