跳到主要內容

臺灣博碩士論文加值系統

(3.236.23.193) 您好!臺灣時間:2021/07/24 13:41
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:吳翊臺
研究生(外文):Yi-TaiWu
論文名稱:基於Source-Filter模型的聲音合成及轉換
論文名稱(外文):Source-Filter Based Sound Synthesis and its Application to Morphing
指導教授:蘇文鈺蘇文鈺引用關係
指導教授(外文):Wen-Yu Su
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:76
中文關鍵詞:線性預測編碼原濾波器模型真實頻譜包絡計算器噪音水平計算器樂器轉換
外文關鍵詞:Linear Predictive CodingSource-Filter ModelTrue Envelope EstimatorNoise Level EstimatorMorphing
相關次數:
  • 被引用被引用:0
  • 點閱點閱:256
  • 評分評分:
  • 下載下載:17
  • 收藏至我的研究室書目清單書目收藏:0
如何運用數位信號處理技術合成出跟原音相似的聲音進而合成出具有演奏水準的樂句一直是音樂信號分析領域裡一個重要課題。為了達到此一目的,我們引進了基於隱藏式馬可夫模型之語音合成系統(HMM-Based Speech Synthesis System, HTS),此系統採用的隱藏式馬可夫模型(Hidden Markov Models, HMMs)能夠有效解決不同聲音音色前後連接的問題。但是我們在使用的過程中也遇到了不少限制。例如,音高(pitch)和長度(duration)無法讓使用者自由指定等。
雖然隱藏式馬可夫模型(HMMs)能夠解決音色連接問題,為了擺脫這些限制,我們還是決定捨棄基於隱藏式馬可夫模型之語音合成系統(HTS),改而採用另一套原濾波器模型(Source-Filter model)合成方法,線性預測編碼(Linear Predictive Coding, LPC)來合成聲音。在原濾波器模型(Source-Filter mode)中,改變音高(pitch)跟長度(duration)可以輕易的透過更動來源(source)來達成。
在本論文中,為了合成出跟原音相似的聲音,我們採用了真實頻譜包絡計算器(True Envelope estimator)以及噪音水平計算器(Noise Level estimator),來分別計算聲音的泛音(Harmonics)和去除音色後的剩餘部分(Residues),跟線性預測編碼(Linear Predictive Coding, LPC)結合。並且,我們使用濾波器參數內插方法(Filter Coefficients Interpolation),讓每一個聲音樣本(sample)都擁有自己的參數,使得聲音更富變化性。
除了合成與原音相似的聲音外,我們也將本系統應用在其他方面,例如不同樂器聲音之間的轉換(morphing)。在我們的合成結果中,樂器的轉換過程非常平順。

Using digital signal processing technology to generate original-like sound and produce a musical phrase further is always the principle issue in music signal processing area. To accomplish this purpose, we introduce the HMM-Based Speech Synthesis System (HTS). It employs the Hidden Markov Models (HMMs) to solve the connection between different sounds effectively. However, we encountered many limitations while using HTS, such as pitch and duration of sound cannot be specified by users.
To get rid of these limitations, we decide to abandon HTS and utilize another Source-Filter model, Linear Predictive Coding (LPC), to synthesize sounds. In Source-Filter model, the pitch and duration of synthesis sounds can be modified easily by altering the source directly.
In this thesis, to synthesize the original-like sounds, we combine the True Envelope and Noise Level estimators with LPC to fit harmonics and residues of sounds. Furthermore, to exhibit the dynamic feature of sound more precisely, the Filter Coefficients Interpolation method is used.
In addition to synthesize original-like sounds, our system is applied to other fields, such as morphing from a musical instrument to another. In our synthesis results, the morphing process is very smooth and can be hardly noticed.

摘要 3
ABSTRACT 4
致謝 5
LIST OF FIGURES 7
LIST OF TABLES 9
1. INTRODUCTION 10
1.1 BACKGROUND AND MOTIVATION 10
1.2 OUTLINE OF THE THESIS 12
2. RELATED WORKS 13
2.1 VIOLIN 13
2.2 GUQIN 17
2.3 HMM-BASED SPEECH SYNTHESIS TOOL FOR VIOLIN AND ITS PROBLEM 18
2.4 LINEAR PREDICTIVE CODING (LPC) 24
2.5 TRUE ENVELOPE ESTIMATION 27
2.6 NOISE LEVEL ESTIMATION 29
3. METHOD 32
3.1 HARMONICS AND RESIDUES 32
3.2 RESIDUES EMPHASIS AND TEMPORAL ENVELOPE 33
3.3 ANALYSIS AND SYNTHESIS FLOW 36
3.4 FILTER COEFFICIENTS INTERPOLATION 39
3.5 TRANSFORM A HIGH ORDER ALL-POLE FILTER INTO BI-QUAD FILTERS 41
4. EXPERIMENTAL RESULTS 43
4.1 DATABASE 43
4.2 SYNTHESIS RESULTS 45
4.2.1 Violin 45
4.2.2 Guqin 48
4.2.3 TWINKLE, TWINKLE LITTLE STAR VARIATION 55
4.3 MORPHING 57
4.4 COMPARISON BETWEEN LPC AND TE-LPC 62
5. CONCLUSIONS AND FUTURE WORKS 66
5.1 CONCLUSIONS 66
5.2 FUTURE WORKS 67
REFERENCES 68

1.Uwe Andresen, “A new way in sound synthesis in 62nd AES Convention. Brussels, Belgium. 1979.
2.Chowning, J., “The synthesis of complex audio spectra by means of frequency modulation Journal of the Audio Engineering Society, 1973. 21(7): p. 526-534.
3.Serra, X. and J. Smith, “Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition Computer Music Journal, 1990. 14(4): p. 12-24.
4.Smith III, J.O., “Physical modeling using digital waveguides Computer Music Journal, 1992. 16(4): p. 74-91.
5.Smith III, J.O., “Physical modeling synthesis update Computer Music Journal, 1996. 20(2): p. 44-56.
6.Tien-Ming Wang, Shih-Wei Huang, and Alvin W.Y. Su, A “New State Clustering Method For Expressive Violin Synthesis in WOCMAT 2011.
7.Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.Y., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P. “The Hidden Markov Model Toolkit (HTK) Version 3.4 2006. http://htk.eng.cam.ac.uk/
8.Tokuda, K., H. Zen, and A.W. Black. “An HMM-based speech synthesis system applied to English Proceedings of the IEEE 77(2):257-286, February 1989.2002: IEEE.
9.Zen, H., Nose, T., Yamagishi, J., Sako, S., and Tokuda, K., “The HMM-based Speech Synthesis System (HTS) Version 2.0 2007. http://hts.sp.nitech.ac.jp/
10.S. Imai, K. Sumita and C. Furuichi : “Me1 log spectrum approximation (MLSA) filter for speech synthesis, Trans. IECE, vol. J66-A, pp.122-129, Feb. 1983.
11.T. Fukada, K. Tokuda, T. Kobayashi, S. Imai, “An Adaptive Algorithm for Mel-Cepstral Analysis of Speech, Proc. IEEE International Conference Acoustics, Speech, and Signal Processing, ICASSP-92, I, pp.137-140, San Francisco, USA.
12.Association, I.P., Handbook of the International Phonetic Association: a guide to the use of the international phonetic alphabet. 1999: Cambridge University Press.
13.M. Muller, D.P.W. Ellis, A. Klapuri, G. Richard, “Signal Processing for Music Analysis IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 0, NO. 0, 2011
14.B. S. Atal, Predictive coding of speech at low bit rates, IEEE Trans. Commun. vol. COM-30, pp. 600-614, April 1982
15.M. R. Schroeder and B. S. Atal, “Code-excited linear prediction (CELP): High-quality speech at very low bit rates, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Tampa, FL, 1985, pp. 937–940.
16.Cremer, L., Physics of the Violin. ISBN 9780262031028. 1984.
17.Smith III, J.O., “Physical modeling using digital waveguides Computer Music Journal, 1992. 16(4): p. 74-91.
18.Rabiner, L.R. and R.W. Schafer, “Introduction to digital speech processing Foundations and Trends in Signal Processing, 2007. 1(1): p. 1-194.
19.V. Välimäki, J. Pakarinen, C. Erkut, and M. Karjalainen, “Discrete-time modeling of musical instruments, Reports on Progress in Physics, vol. 69, pp. 1–78, January 2006.
20.Cadoz, C. and M. Wanderley, “Gesture-music Trends in Gestural Control of Music, 2000: p. 71-94.
21.F. Villavicencio, A. Röbel and X. Rodet, “IMPROVING LPC SPECTRAL ENVELOPE EXTRACTION OF VOICED SPEECH BY TRUE-ENVELOPE ESTIMATION, in Proc. of the ICASSP’98, France, 2006.
22.S. Imai and Y. Abe, “Spectral Envelope Extraction by Improved Cepstral Method, Electron. and Comm. (in Japan) , vol. 62, no. 4, pp. 10–17, 1979.
23.A. Röbel and X. Rodet, “Efficient Spectral Envelope Estimation and its Application to Pitch Shifting and Envelope Preservation, in Proc. of the DAF’x’05 , Spain, 2005.
24.A. B. Carlson, P. B. Crilly, and J. C. Rutledge, Communication Systems – An Introduction to Signals and Noise in Electrical Communication. McGraw-Hill, 4th ed., 2002.
25.V. Välimäki, J. Huopaniemi, M. Karjalainen, and Z. Jánosy, “Physical modeling of plucked string instruments with application to real-time sound synthesis, Journal of the Audio Engineering Society, vol. 44, pp. 331–353, May 1996.
26.V. Välimäki, “Physics-based modeling of musical instruments, Acta Acustica united with Acustica, vol. 90, pp. 611–617, July/August 2004.
27.X. Serra, “Musical sound modeling with sinusoids plus noise, in Musical Signal Processing (C. Roads, S. T. Pope, A. Piccialli, and G. De Poli, eds.), ch. 3, pp. 91–122, Swets & Zeitlinger, 1997.
28.C. Roads, The Computer Music Tutorial. The MIT Press, 1995.
29.J. A. Moorer, “Signal processing aspects of computer music: A survey, in Digital Audio Signal Processing – An Anthology (J. Strawn, ed.), ch. 4, pp. 149–220,William Kauffmann Inc., 1985.
30.S. C. Bass and T. W. Goeddel, “The efficient digital implementation of subtractive music synthesis, IEEE Micro, vol. 1, pp. 24–37, August 1981.
31.J. Makhoul, “Linear prediction: A tutorial review, Proceedings of the IEEE, vol. 63, pp. 561–580, April 1975.
32.Smith III, J.O., Physical audio signal processing: Digital waveguide modeling of musical instruments and audio effects. Center for Computer Research in Music and Acoustics (CCRMA), Department of Music, Stanford University, Stanford, California. 94305.
33.H. Penttinen, J. Pakarinen, V. Välimäki, M. Laurson, H. Li, and M. Leman, “Model-based sound synthesis of the guqin, Journal of the Acoustical Society of America, vol. 120, no. 6, pp. 4052–4063, Dec. 2006.
34.C. Yeh, A. Röbel, “Adaptive noise level estimation, Proc. of the 9th Int. Conf. on Digital Audio Effects (DAFx'06), Montreal, 2006.
35.S. Imai, Cepstral analysis synthesis on the mel frequency scale, Proc. of ICASSP, pp.93-96, Feb. 1983.
36.S. Imai, K. Sumita and C. Furuichi, Me1 log spectrum approximation (MLSA) filter for speech synthesis, Trans. IECE, vol. J66-A, pp.122-129, Feb. 1983
37.Young, S., et al., The HTK book (for HTK version 3.4). Cambridge University Engineering Department, 2006. 2(2): p. 2.3.
38.T. Yoshimura, Simultaneous modeling of phonetic and prosodic parameters, and characteristic conversion for HMM-based text-to-speech systems, Ph.D thesis, Nagoya Institute of Technology, Jan. 2002.
39.Tokuda, K., Masuko, T., Kobayashi, T. and Imai, S., Mel-generalized cepstral analysis-A Unified Approach to Speech Spectral Estimation. in Proceedings of ICSLP94. 1994.
40.J. O. Smith, “Acoustic modeling using digital waveguides, in Musical Signal Processing (C. Roads, S. T. Pope, A. Piccialli, and G. De Poli, eds.), ch. 7, pp. 221–264, Swets & Zeitlinger, 1997.
41.J. O. Smith, “Principles of digital waveguide models of musical instruments, Applications of Digital Signal Processing to Audio and Acoustics (M. Kahrs and K. Brandenburg, eds.), ch. 10, pp. 417–466, Kluwer Academic, 1998.
42.J. O. Smith, “Music applications of digital waveguides, Tech. Rep. STAN-M-39, Center for Computer Research in Music and Acoustics, Stanford University, USA, 1987.
43.M. Alonso, R. Badeau, B. David, and G. Richard, “Musical tempo estimation using noise subspace projection, in IEEE Workshop on applications of signal processing to audio and acoustics (WASPAA ’03), 2003, pp. 95–98.
44.A. Röbel and M. Zivanovic, “Signal decomposition by means of classification of spectral peaks, in Proc. of the International Computer Music Conference (ICMC’04), Miami, Florida, 2004.
45.G. Peeters and X. Rodet, “Sinusoidal Characterization in terms of Sinusoidal and Non-Sinusoidal Components, in Proc. of 1st international conference on Digital Audio Effects (DAFx’98), Barcelona, Spain, 1998.
46.B. David, G. Richard, and R. Badeau, “An EDS modeling tool for tracking and modifying musical signals, in Stockholm Music Acoustics Conference 2003, Stockholm, Sweden, 2003, pp. 715–718.
47.M. Lagrange, S. Marchand, and J. Rault, “Tracking Partials for the Sinusoidal Modeling of Polyphonic Sounds, in Proceedings of the IEEE International Conference on Speech and Signal Processing (ICASSP’05), Philadelphia, USA, 2005.
48.N. L. Johnson, S. Kotz, and N. Balakrishnan, Continuous Univariate Distributions, John Wiley & Sons, Inc, New York, 2nd. edition, 1994.
49.O. S. Rice, “Mathematical Analysis of Random Noise, Bell System Technical Journal, Vol. 23, No. 7, pp. 282–333, July 1944, and Vol. 24, No. 1, pp. 46–156, Jan. 1945.
50.Jussi Pekonen, “Computationally Efficient Music Synthesis – Methods and Sound Design, M.S. thesis, Dept. Elect. and Comm. Eng., Helsinki Univ., Finland, 2007.
51.J. F. Schouten, “The perception of timbre, in Proc. 6th Int. Congr. Acoust., Tokyo, Japan, 1968, vol. GP-6-2, pp. 35–44.
52.J.J. Burred, A. Röbel, and T. Sikora,
Dynamic Spectral Envelope Modeling for Timbre Analysis of Musical Instrument Sound, IEEE Transactions on Audio, Speech and Language Processing, March 2010.
53.H. Fujisaki H. Hermansky and Y. Sato, “Spectral Envelope Sampling and Interpolation in Linear Predictive Analysis of Speech, in Proc. of the ICASSP’84 , 1984, pp. 2.2.1–2.2.4.
54.F. Itakura, “Line spectrum representation of linear predictive coefficients, J. Acoust. Soc. Amer, vol. 57 Suppl., no. 1, p. S35, 1975.
55.T. Bäckström W.B. Kleijn and P. Alku, “On Line Spectral Frequencies, IEEE Signal Processing Letters , vol. 10, no. 3, pp. 75–77, 2003.
56.Dan Ellis, “Sinewave Speech Analysis/Synthesis in Matlab, the example routine of LPC analysis and synthesis, Electrical Engineering Department, Columbia University. http://labrosa.ee.columbia.edu/matlab/sws/
57.Forsythe, G.E., M.A. Malcolm and C.B. Moler, Computer Methods for Mathematical Computations, Prentice-Hall, 1977.
58.Wei-Hsiang Liao, Alvin W. Y. Su, Chunghsin Yeh, and Axel Roebel. “On the use of perceptual properties for melody estimation, In Proc. of the Int. Conf. on Digital Audio Effects (DAFx-11), Paris, France, Sept. 19–23, 2011.
59.A. Klapuri, “Analysis of musical instrument sounds by source-filter decay model, in Proc. IEEE Int. Conf. Audio, Speech, Signal Process., Honolulu, HI, 2007, pp. 53–56.
60.Y. S. Siao, W. L. Chang, and A. Su, “Analysis and transsynthesis of solo Erhu recordings using adaptive additive/subtractive synthesis, 120th Convention of Audio Engineering Society, Paris, 2006.
61.Migneco, R.V., Kim, Y.E., Excitation modeling and synthesis for plucked guitar tones, Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on , vol., no., pp.193-196, 16-19 Oct. 2011.
62.H. Hahn, A. Röbel, J.J. Burred and S. Weinzierl, Source-Filter Model for Quasi-Harmonic Instruments, Proc. International Conference on Digital Audio Effects (DAFX), Graz, Austria, September 2010.
63.A. W. Su and S. F. Liang, “A Generalized Model-Based Analysis/Synthesis Method for Plucked-String Instruments by Using Recurrent Nerual Networks in Proceeding of the 106th Convention of the Audio Engineering Society, 1999.
64.Alvin W.Y. Su and Sheng-Fu Liang, “A Class Of Physical Modeling Recurrent Networks for Analysis/Synthesis of Plucked-String Instruments, IEEE. Trans. On Neural Networks, Vol. 13, no.5, Pages 1137-1147, Sept, 2002.
65.Sheng-Fu Liang and Alvin W.Y. Su, “Modeling and Analysis of Acoustic Musical String Using Kelly-Lochbaum Lattice Networks, “2004 Best Annual Paper Award, Journal Of Information Science and Engineering 20, Pages 1161-1182, 2004.
66.Maragos, P.; Schafer, R.; Mersereau, R.; , Two-dimensional linear prediction and its application to adaptive predictive coding of images, Acoustics, Speech and Signal Processing, IEEE Transactions on , vol.32, no.6, pp. 1213- 1229, Dec 1984.
67.Giuliano Antoniol, Vincenzo Fabio Rollo, and Gabriele Venturi, “Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories, In Proceedings of the 2005 international workshop on Mining software repositories (MSR '05).
68.C. Ris and S. Dupont, “Assessing Local Noise Level Estimation Methods: Application to Noise Robust ASR, Speech Communication, 2000.
69.M. Alonso, R. Badeau, B. David, and G. Richard, Musical tempo estimation using noise subspace projection, in IEEE Workshop on applications of signal processing to audio and acoustics (WASPAA '03), 2003, pp. 95-98.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top