(3.238.186.43) 您好!臺灣時間:2021/03/01 09:39
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:曾惠虞
研究生(外文):Hui-Yu Tseng
論文名稱:立體音訊中空間資訊之擷取與修改
論文名稱(外文):Extracting and Modifying the Spatial Information in Stereo Audio
指導教授:張嘉銘張嘉銘引用關係
指導教授(外文):Chia-Ming Chang
學位類別:碩士
校院名稱:大同大學
系所名稱:資訊工程學系(所)
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:英文
論文頁數:44
中文關鍵詞:音場合成空間資訊立體音
外文關鍵詞:sound field synthesisspatial informationstereo
相關次數:
  • 被引用被引用:0
  • 點閱點閱:138
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:15
  • 收藏至我的研究室書目清單書目收藏:2
本篇論文提出一個利用雙聲道收到的立體音音訊,去擷取原始音場中的空間資訊與單一代表音源並重新合成的方法。目的是希望能夠在不同的聆聽條件下,都能合成出適當的音場。

我們所討論的情況是同一樂器群為線性等距排列方式的錄音情形。由於各個聲源都是演奏同樣的樂曲,也就是依時間序列上演奏著同樣的音階;加上人耳對於聲音相位的不敏感,所以我們假設各音源在各自的二維時間-頻譜(spectrogram)上的頻率能量是類似的。也就是麥克風所收到的訊號,可視為由同一幅時間-頻譜圖經過位移、衰減後再相加的結果。而此收音的過程,與影像拍攝時,在快門開放時間內,因連續位移產生模糊影像的均勻線性退化程序類似。所以我們利用影像處理的觀念,來找出空間參數,並還原出能代表原始單一音源的訊號;而其它相類似的音源,可以藉由抽取出來的單一音源訊號來產生。

當我們擷取出空間參數與音源之後,便可以根據這兩者去合成近似於原來之音場;也能將空間參數加以修改,以合成新的音場。如此一來,便可以符合使用者的空間需求與不受原始錄音與播放條件不同的限制。

在本篇論文中,利用模擬的方法來檢驗此推論的可行性,而結果也顯示:能夠將聲音的二維時間-頻譜圖,與影像退化╱復原的觀念,應用於聲場中空間資訊的擷取與音場的重新合成。若將此分解與再合成的觀念,應用於多聲道的處理上,將有一定的壓縮效果。
In this thesis, the method to extract the spatial information and single representing source of original sound field in stereo, and then synthesis them as demanded are proposed. The objective is to synthesize appropriate sound field corresponding to
vary listening condition.

The discussed situation is focused on multi-sources playing the same melody by the same music instrument aligned in line. Since each source plays the same melody, the same music scale would be played on the sector in time. Human perception is insensitive to the phase of audio. So we might assume that the magnitudes of spectrogram of each source is similar even their waveforms are different. Therefore, the signal received by microphone could be treated as the summation of one spectrogram with shifts in time and attenuation. It is similar to an image corrupted by a motion blur function. Thus, the concept of image-restoration may be applied to extract the spatial information and single representing source by which the property of time-frequency components of each original source could be represented. The sound field similar to original sound field can be synthesized using the extracted single representing source and the obtained spatial information. Also the spatial information can be modified to synthesize the different sound field for different playback conditions in pleasure.

The simulation is performed to confirm the method in this thesis. And the result shows that the concept of image distortion/restoration process with sound spectrogram could be applied to the spatial information extraction and sound field resynthesis. There will be certain compression effects with applying the concept of decomposing and re-synthesizing in this thesis with multi-channel processing in the future.
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
CHINESE ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
1 INTRODUCTION 1
1.1 Motivation and Objective . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Paper Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 RELATED RESEARCHES 5
2.1 Source Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Direction of Arrival (DOA) . . . . . . . . . . . . . . . . . . . 5
2.2 Sound Source Separation . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Delay-and-Sum Method . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Sound Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Up-Mix System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6 Image Degradation/Restoration Process . . . . . . . . . . . . . . . . 19
2.6.1 Image Degradation . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6.2 Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . 20
3 SYSTEM ORGANIZATION 23
3.1 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Model Presumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Sound Source Distortion . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.1 Extraction of Spatial Information and Single Source . . . . . . 30
3.4 Sound Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4 SIMULATION AND EXPERIMENT 33
4.1 Simulation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.1 Efficiency Evaluate . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.2 SNR Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.3 Subjective Test . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.1 Spatial information Extraction . . . . . . . . . . . . . . . . . . 39
4.2.2 Spectrogram Issues . . . . . . . . . . . . . . . . . . . . . . . . 39
5 CONCLUSIONS AND FUTURE WORK 41
5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . .41
[1] L. C. Parra and C. V. Alvino, “Geometric source separation: Merging convolutive source separation with geometric beamforming,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 10, no.6, Sep. 2002.
[2] E. Vincent, “Musical source separation using time-frequencysource priors,”IEEE Trans. on Audio, Speech, and Language Processing, vol. 14, no. 1, Jan. 2006.
[3] S. Ukai, H. Saruwatari, T. Takatani, R. Mukai, and H. Sawada., “Multistage simo-model-based blind source separation combining frequency-domain ica and time-domain ica,” in IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings (ICASSP), volume 4, May 2004, pp. iv–109 – iv–112.
[4] H.-M. Park, C. S. Dhir, D.-K. Oh, and S.-Y. Lee, “Filterbank-based blind signal separation with estimated sound direction," in IEEE International Symposium on Circuits and Systems (ISCAS), volume 6, May 2005, pp. 5874 – 5877.
[5] T. Nishiura, T. Yamada, S. Nakamura, and K. Shikano,
“Localization of multiple sound sources based on csp analysis with a microphone array,” in IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings(ICASSP), volume 2, Jun. 2000, pp. 1053–1056.
[6] J.-T. Chien, J.-R. Lai, and P.-Y. Lai, “Microphone array signal processing for far-talking speech recognition,” in IEEE Third Workshop on Signal Processing Advances in Wireless Communications (SPAWC)., number 20-23, Mar. 2001, pp.322–325.
[7] D. Y., N. T., and K. H. I. T., “A design of audio-visual talker tracking system based on csp analysis and frame difference in real noisy environments,” in IEEE 6th Workshop on Multimedia Signal Processing, Sep-Oct. 2004, pp. 63–66. 29 Sept.-1 Oct. 2004.
[8] J.-M. Valin, F. Michaud, J. Rouat, and D. Letourneau, “Robust sound source localization using a microphone array on a mobile robot,” in International Conference on Intelligent Robots and Systems Proceedings (IROS), volume 2, Oct. 2003, pp. 1228–1233.
[9] R. . SCHMIDT, “Multiple emitter location and signal parameter estimation,”IEEE Trans. on Audio, Speech, and Language Processing, vol. 34, no. 3, pp. 276–280, Mar. 1986. IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. AP-34, NO. 3, MARCH 1986.
[10] D. Giuliani, M. Matassoni, and M. Omologo, “Hands free continuous speech reconnition in noisy environment using a four microphone array,” in Acoustics, Speech, and Signal Processing (ICASSP), volume 1, May 1995, pp. 860–863.
[11] C. Avendano and J.-M. Jot, “Ambience extraction and synthesis from stereo signals for multi-channel audio up-mix,” in IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings (ICASSP)., volume 2, Sep 2002, pp. 1957 – 1960.
[12] R. Dressler, “Dolby surround pro logic ii decoder principles of operation,”http://www.dolby.com/assets/pdf/tech library/209 Dolby Surround Pro Logic II Decoder Principles of Operation.pdf.
[13] H. F. Silverman, W. R. Patterson, and J. L. Flanagan, “The huge microphone array,” IEEE Concurrency, vol. 6, no. 4, pp. 36–46, Oct-Dec. 1998.
[14] C. Knapp and G. Carter, “The generalized correlation method for estimation of time delay,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, pp. 320–327, Aug. 1976.
[15] S. Haykin, Advances in Specrtum Analysis and Array Processing, volume II. Prentice-Hall, 1991.
[16] Y. Tamai, S. Kagami, and H. Mizoguchi, “Circular microphone array for meeting system,” in IEEE Sensors Proceedings, volume 2, Oct. 2003, pp. 1100–1105.
[17] C.-M. Chang and C.-H. Peng, “Applying the filtered back-projection method to extract signal at specific position,” in NCS Proceedings, Dec. 2005.
[18] R. C. Gonzalez and R. E. Woods, Digital Image Processing. Prentice Hall, 1992.
[19] J. H. McClellan, R. W. Schafer, and M. A. Yoder, DSP FIRST: A multimedia approach. Prentice
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔