跳到主要內容

臺灣博碩士論文加值系統

(44.200.169.3) 您好!臺灣時間:2022/12/01 01:38
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:方宜彥
研究生(外文):Yi-Yan Fang
論文名稱:低複雜度音訊變速播放系統
論文名稱(外文):A Variable Speed Playback System of Audio using the Low Complexity Algorithm
指導教授:陳福坤陳福坤引用關係
指導教授(外文):Fu Kun Chen
學位類別:碩士
校院名稱:南台科技大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:中文
論文頁數:75
中文關鍵詞:時長修正同步疊加演算法波形疊加演算法平均主觀評分訊噪比頻譜距離感官加權
外文關鍵詞:Time scale modification (TSM)Synchronized OLA (SOLA)waveform similarity Overlap and Add (WSOLA)Mean Opinion Score (MOS)Signal-to-Noise Ratio (SNR)Spectral Distance (SD)Perceptual weighting
相關次數:
  • 被引用被引用:1
  • 點閱點閱:459
  • 評分評分:
  • 下載下載:38
  • 收藏至我的研究室書目清單書目收藏:0
時長調整(Time scale modification,TSM)演算法是可以調整聲音播放時語者的說話速度,且能夠保持說話者音調的一種技術。本論文針對目前合成出的聲音品質較佳的同步疊加演算法(Synchronized OLA, SOLA) [Roucos 1985]以及波形相似疊加演算法(Waveform Similarity OverLap-and-Add, WSOLA) [Verhelst 1993]提出運算減量的方法。所提出的低複雜度演算法(Low Complexity TSM, L-TSM)可以減少約80%的計算量,且還能夠保持原有的聲音品質。我們另提出漸進式搜尋波形相似疊加演算法(Gradual WSOLA, G-WSOLA),來對L-WSOLA再次降低計算複雜度。此外,本論文針對調整時長後的音訊目前尚未有較佳的客觀評估法則問題,提出頻譜相似曲線(Spectrum Similarity Curve, SSC)的評量方法。頻譜相似曲線可以用客觀的數據來比較出各種時長調整演算法的聲音品質,也確實和平均主觀評分法的結果一致。之後,本論文更進一步提出具感官加權的頻譜相似曲線(Perceptually Weighted Curve of Spectrum Similarity, PWCSS)評量方法,可以比之前所提出的頻譜相似曲線(SSC)更能區分聲音品質之優劣。具感官加權的頻譜相似曲線(PWCSS)也確實和平均主觀評分法(Mean Opinion Score, MOS)的結果一致。所提出的頻譜相似曲線的評估方式,不僅可細微地比較各時間點聲音品質,也可細微地比較各時長調整演算法和其他衍生方法的優勝劣敗。尤有甚者,頻譜相似曲線評估方式還可針對不同時長因子的聲音品質進行比較,這是目前無任何客觀衡量法則可以做到的。而對所提出的低複雜度演算法進行驗證,使用頻譜相似曲線來客觀衡量聲音品質,可發現其品質衰退並不明顯。最後,我們並將各種典型的時長調整演算法做一實現來進行實地驗證。
Time scale modification (TSM) is able to change the speech speed with the same pitch (the speaker's tone) in playback. In this paper, we proposed a simple TSM approach, called the low complexity TSM (L-TSM), to reduce the computational complexity of SOLA [Roucos 1985] and WSOLA [Verhelst 1993]. The proposed L-TSM approach can reduce about 80% computational load for SOLA and WSOLA with perceptually intangible degradation in performance. In addition, we also proposed another approach, called the Gradual WSOLA (G-WSOLA) algorithm to further reduce the complexity of WSOLA one. Besides, to evaluate the quality of two modified speeches, it is well known that the Mean Opinion Score (MOS) is the best subjective measurement as well as the spectrogram is the best objective one. However, they are difficult to receive the official confirm. In this paper, we propose a novel measure, called Spectrum Similarity Curve (SSC). The proposed SSC can objectively evaluate the quality of TSM-modified speech comparing with the original speech along the time axis. Furthermore, the Perceptually Weighted Curve of Spectrum Similarity (PWCSS) has been proposed to evaluate the quality of TSM-modified speech with more close to human perception. The results of proposed SSC methods can compare the quality of TSM-modified speech for different TSM algorithms. They also match the result of MOS results. Applying the PWCSS measurement, we can show the degradation of performance for the proposed two simplified TSM approaches, L-TSM and G-WSOLA algorithms. Finally, a software implementation for the mentioned TSM approaches will be achieved with windows programming.
摘 要 I
英文摘要 II
致  謝 III
目  次 IV
圖目錄 VI
第 1 章 簡介 1
1.1 音訊播放與應用 1
1.2 音訊時長調整 2
1.3 時長調整演算法(TSM) 2
1.4 時長調整演算法評估 4
1.5 論文大綱 5
第 2 章 聲音時長調整演算法(TSM) 6
2.1 疊加演算法(OLA) 6
2.2 同步疊加演算法(SOLA) 8
2.3 波形相似疊加演算法(WSOLA) 11
2.4 時域基週同步疊加演算法(TD-PSOLA) 13
2.5 簡化型時長調整演算法 17
2.5.1 使用One-Bit 運算以降低計算量[Wilson 2001] 18
2.5.2 使用越零點資訊以降低資料維度[Darragh 2005] 19
第 3 章 時長調整之聲音品質評估 20
3.1 主觀品質評估 20
3.2 客觀品質評估 21
3.2.1 訊噪比(SNR) 21
3.2.2 頻譜圖(Spectrogram) 23
3.2.3 頻譜距離(SD) 錯誤! 尚未定義書籤。
3.2.4 重疊區域之均方差(Overlap MSD) 25
3.3 頻譜相似度客觀評估法 25
3.4 具聽覺感官加權之聲音品質評估法 32
第 4 章 音訊變速播放之低複雜度演算法 40
4.1 Log-SOLA 與 Log-WSOLA演算法 40
4.1.1 Log-SOLA 40
4.1.2 Log-WSOLA 42
4.2 低複雜度演算法 43
4.2.1 L-SOLA 43
4.2.2 L-WSOLA 45
4.2.3 聲音品質評估 47
4.2.4 計算複雜度評估 52
4.3 漸進式搜尋波形相似疊加演算法 53
4.3.1 漸進式搜尋波形相似疊加演算法 53
4.3.2 聲音品質評估 55
第 5 章 音訊變速播放系統實現 58
5.1 系統規劃 58
5.2 系統介面與使用 58
第 6 章 結論 61
參考文獻 62
相關連結 63
作者簡介 64
[Darragh 2005] Darragh Ballesty and Richard D. Gallery, Audio Signal Time Scale Modification, United States Patent Application Publication, No US6944510B1, Sep. 2001.
[Doman 2003] D. Doman, R. Lawlor and E. Coyle, Time-scale modification of speech using a synchronised and adaptive overlap-add (SAOLA) Algorithm, Audio Engineering Society (AES) 114 the convention, Amsterdam, Netherlands, Mar. 2003.
[Gokhale 2003]R. S. Gokhale, Packet loss concealment in voice over internet, Master Thesis, University of South Florida, July 31, 2003.
[Ho 2005] Ho Sang Sung, Dae Hwan Hwang, MoonKeun Lee Young Cheol Park ,Ki Seung Lee and Dae Hee Youn, Speech Restoration System And Method for Concealing Packet Losses, United States Patent Application Publication, No US2005/0010401A1, Jan. 2005.
[Lawlor 1990] Lawlor, B. and Fagan, A.D., A Novel Efficient Algorithm for Audio Time-Scale Modification, Irish Signals and Systems Conference ’99, National University of Ireland, Galway, 1999.
[Michael 2002] Michael J. Prerau, Slow Motion Sound: Implementing Time Expansion/Compression with a Phase Vocoder, December 16, 2002
[Mouline 1990] E. Mouline, F. Charpentier, Pitch-synchronous waveform processing technique for text-to-speech synthesis using diphones, Speech Communication, No. 9, pp.453-467, 1990
[Portnoff 1981]M. Portnoff, Short-time Fourier analysis of sampled speech, IEEE Trans. Acoust., Speech, & Signal Proc., Vol. 29, no.3, pp. 364-373, 1981.
[Roucos 1985] S. Roucos, A. Wilgus, High quality time scale modification for speech, IEEE International Conference on ASSP, pp.493-496, 1985.
[Verhelst 1993] W. Verhelst and M. Roelands, An overlap-add technique based on waveform similarity (WSOLA) for high quality time scale modification of speech, IEEE International Conference on ASSP, Vol.2, 27-30, April 1993
[Wah 1998] W. H. Wah, Variable speed playback system for speech and audio signals (and Topics in Video Processing), Master Thesis, MIT, August 1998.
[Werner 1999] Werner Verhelst, Overlap-Add Methods for Time-scaling of Speech, May 17,1999
[Wilson 2001] Dennis L. Wilson and James L. Wayman, Synchronized Overlap Add Voice Processing Using Windows And One Bit Correltors, United States Patent Application Publication, No US6173255B1, Jan. 2001.
[Wong 1997] H. W. Wong, O. C. Au, W. C. Wong, H. P. Lau, On improving the intelligibility of synchronized overlap-and-add (SOLA) at low TSM factor, Proc. of IEEE Region 10 Conference (TENCON), Brisbane, Australia, Dec. 1997.
[Wong 1998] W. C. Wong, O. C. Au and H. W. Wong, Fast time scale modification using envelope-matching technique (EM-TSM), Proc. of the IEEE International Symposium on Circuits and Systems, Vol. 5, pp. 550–553, May 1998.
[王 2004] 王小川,語音訊號處理, 全華, 2004.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top