

( 您好!臺灣時間:2025/01/26 02:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


研究生(外文):Jen-Nan Yu
論文名稱(外文):Design and Implementation of a Microphone Array Based Speaker Recognition System
指導教授(外文):Ching-Han Chen
外文關鍵詞:Speaker RecognitionMicrophone ArrayProbabilistic Neural Network
  • 被引用被引用:0
  • 點閱點閱:342
  • 評分評分:
  • 下載下載:19
  • 收藏至我的研究室書目清單書目收藏:0
The study is to design an embedded speaker identification system based on microphone array in order to improve the efficiency of single microphone identification systems. The system is composed of four modules including sound signal extraction from microphone array, beam forming, speaker features extraction and speaker identification module. Sound signal module is to collect speaker sound information by using loop microphone array composed of Micro Electro Mechanical System (MEMS) microphone; Beam forming is to enhance sound signal and remove background noise via multi-channel sound processing; Linear Predictive Cepstrum Coefficient (LPCC) is applied to represent a speaker sound characteristics module; The classifier of Probabilistic Neural Network (PNN) is applied to identify speaker. Besides, we built a database of experimental speaker sounds with one hundred and twenty same statements recorded by twelve people. This is to validate the speaker identification system. The recognition rate was optimized by PNN smoothing parameters and beam forming parameters during the training. The test results showed that our speaker identification system based on microphone array could reduce about 10% error rate compared to the single one.
第一章 緒論 1
1.1 研究動機 1
1.2 文獻回顧 2
1.3 論文架構 5
第二章 MEMS麥克風陣列波束成形 6
2.1 MEMS 麥克風 6
2.1.1 MEMS 麥克風的原理 7
2.1.2 MEMS麥克風的種類 7
2.1.3 麥克風的指向性 8
2.2 麥克風陣列 10
2.2.1 線狀麥克風陣列 10
2.2.2 環形麥克風陣列 11
2.3 波束成形演算法 12
2.3.1 延遲求和波束成形(Delay and Sum Beamformer) 12
2.3.2 利用GCC-PHAT 估算TDOA(Time Difference of Arrival) 14
2.4 聲源方位估測演算法 15
2.4.1 到達時間差(TDOA)聲源方位估測法 15
2.5 特徵擷取 16
2.5.1 前處理 16
2.5.2 線性預測倒頻譜係數(LPCC) 19
2.6 機率神經網路(PNN)分類器 20
2.6.1 機率神經網路架構 20
第三章 麥克風陣列語者辨識系統 22
3.1 系統架構 23
3.1.1 聲音訊號擷取 24
3.1.2 波束成形 25
3.1.3 語音特徵擷取(feature extraction) 26
3.1.4 語者辨識 27
3.2 散事件系統建模 28
3.2.1 麥克風陣列語者辨識系統建模 28
3.2.2 聲音訊號擷取建模 29
3.2.3 波束成形建模 30
3.2.4 語音特徵擷取建模 31
3.2.5 語者辨識建模 32
3.2.6 主要的狀態(state)與動作(action) 33
3.3 軟體合成 35
3.3.1麥克風陣列語者識系統模型軟體合成 36
3.3.2聲音訊號擷取模型軟體合成 37
3.3.3波束成形模型軟體合成 37
3.3.4語音特徵擷取模型軟體合成 38
3.3.5語者辨識模型軟體合成 39
3.3.6軟體的模擬 40
第四章 系統整合實驗與驗證 45
4.1實驗環境 45
4.1.1 STM32F429 Discovery 開發板規格簡介 45
4.1.2 MEMS麥克風規格簡介 48
4.2實驗 48
4.2.1 受測人員資料採集 49
4.2.2 麥克風陣列語者辨識系統樣本與參數的訓練 51
4.3語者辨識性能評估 54
4.3.1 單一麥克風的語者辨識效能 55
4.3.2 使用麥克風陣列的語者辨識效能 55
4.4 實驗結果與討論 56
第五章 結論 57
參考文獻 59
[1] “Speech Recognition”, [Online] Available: https://en.wikipedia.org/wiki/Speech_recognition
[2] Gongping Huang, Jacob Benesty and Jingdong Chen, “On the Design of Frequency-Invariant Beampatterns with Uniform Circular Microphone Arrays”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. PP, pp.1-1, March 2017.
[3] B. D. Van Veen and K. M. Buckley, “Beamforming: A Versatile Approach to Spatial Filtering,” IEEE ASSP Magazine, vol.5, no.2, pp.4 –24, April 1988.
[4] “語音識別”, [Online] Available: https://zh.wikipedia.org/wiki/%E8%AF%AD%E9%9F%B3%E8%AF%86%E5%88%AB
[5] K. H. Davis, R. Biddulph and S. Balashek, “Automatic Recognition of Spoken Digit”, Journal of the Acoustical Society of America, vol.24 No 6, November 1952.
[6] N. Morgan and H. Franco, “Applications of neural networks to speech recognition”, IEEE Signal Processing Magazine, vol. 14, pp. 46-48, Nov.1997.
[7] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”, Proceedings of the IEEE, vol.77, pp 257-286, Feb.1989.
[8] Warren McCulloch and Walter Pitts, "A Logical Calculus of Ideas Immanent in Nervous Activity", Bulletin of Mathematical Biophysics, vol.5, pp.115–133, in 1943.
[9] Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn and Dong Yu, “Convolutional Neural Networks for Speech Recognition”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, pp. 1533-1545, July 2014.
[10] D. F. Specht, “Probabilistic neural networks for classification, mapping, or associative memory”, IEEE International Conference on Neural Networks, vol.1, pp.525-532, July 1988.
[11] B. S. Atal, “Effectiveness of linear prediction characteristics of the speech
wave for automatic speaker identification and verification”, J. Acoust. Soc. Am., vol. 55, June 1974.
[12] R. Vergin, D. O'Shaughnessy and V. Gupta, “Compensated mel frequency cepstrum coefficients", IEEE ICASSP Processing Conference Proceedings, vol.1, pp.323-326, May 1996.
[13] V. M. Alvarado, H. F. Silverman, "Experimental Results Showing the Effects of Optimal Spacing Between Elements of a Linear Microphone Array", ICASSP-90, pp. 837-84, April 1990.
[14] S. Gholamrezaei, S. Alirezaee, A. Ahmadi, M. Ahmadi and S. Erfani, "Sound target localization in a 2-D microphone array", Electrical and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on, pp.1168 - 1171, 3-6 May 2015.
[15] Y. Tamai, S. Kagami, H. Mizoguchi, K. Sakaya, K. Nagashima and T. Takano, Circular microphone array for meeting system”, Sensors, 2003.Proceedings of IEEE, Vol.2, pp.1100 - 1105, Oct 2003.
[16] Y. Tamai, S. Kagami, Y. Amemiya, Y. Sasaki, H. Mizoguchi and T. Takano, "Circular microphone array for robot's audition", Sensors, 2004. Proceedings of IEEE, vol.2, pp. 565 - 570, 24-27 Oct 2004.
[17] Y. Sasaki, M. Kabasawa, S. Thompson, S. Kagami, K. Oro, “Spherical Microphone Array for Spatial Sound Localization for a Mobile Robot”, IEEE/RSJ International Conference on Intelligent Robots and Systems, 7-12 Oct. 2012.
[18] P. R. Roth, “Effective measurements using digital signal analysis,” IEEE Spectrum, vol.8, pp.62-70, April 1971.
[19] G. C. Carter, A. H. Nuttall, and P. G. Cable, “The smoothed coherence transform”, Proceedings of the IEEE, vol. 61, pp. 1497-1498, Oct. 1973
[20] C. H. Knapp and G. C. Carter, “The generalized correlation method for estimation of time delay”, IEEE Trans. Acoustic speech and Signal Processing, vol.24, pp.320-327, Aug. 1976
[21] M. S. Brandstein, H. F. Silverman, “A Robust Method for Speech Signal Time-Delay Estimation in Reverberant Room “, ICASSP-97, vol.1, pp.375-378, April 1997.
[22] R. O. Schmidt, “Multiple Emitter Location and Signal Parameter Estimation”, IEEE Transaction Antennas and Propagation, vol.34, pp.276-280, March 1986.
[23] K. Yao, R. E. Hudson, C. W. Reed, D. Chen, and F. Lorenzelli, “Blin21beamforming on a randomly distributed sensor array system”, IEEE Journal on Selected Areas in Communications, vol.16, pp.1555–1567, Oct. 1998.
[24] T. Yamada, S. Nakamura and K. Shikano, “Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array”, IEEE Transactions on Speech and Audio Processing, vol. 10, pp. 48-56, August 2002.
[25] Xianyu Zhao and Zhijian Ou, “Closely Coupled Array Processing and Model-Based Compensation for Microphone Array Speech Recognition”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, pp.1114-1122, February 2007.
[26] Jungpyo Hong, Seungho Han, Sangbae Jeong, and Minsoo Hahn, “Adaptive microphone array processing for high-performance speech recognition in car environment”, IEEE Transactions on Consumer Electronics, vol. 57, pp. 2, March 2011.
[27] Kenichi Kumatani, John McDonough and Bhiksha Raj, “Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors”, IEEE Signal Processing Magazine, vol. 29, pp.127-140, October 2012.
[28] Weifang Li, Longbiao Wang, Yicong Zhou, John Dines, Mathew Magimai. –Doss, Hervé Bourlard and Qingmin Liao, “Feature Mapping of Multiple Beamformed Sources for Robust Overlapping Speech Recognition Using a Microphone Array”, vol. 22, pp. 2244-2255, October 2014.
[29] Soudeh A. Khoubrouy and John H. L. Hansen, “Microphone Array Processing Strategies for Distant-Based Automatic Speech Recognition”, vol. 23, pp.1344-1348, July 2016.
[30] X. Anguera, C. Woofers, J. Hernando, "Speaker diarization for multi-party meetings using acoustic fusion", Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on, pp. 426 – 431, 27-27 Nov. 2005.
[31] Ching-Han Chen, Tun-Kai Yao, Jia-Hong Dai and Chen-Yuan Chen, “A pipelined multiprocessor SOC design methodology for streaming signal processing”, Journal of Vibration and Control, vol.20, pp.163-178, in 2014
[32] Ching-Han Chen, Chia-Ming Kuo, Chen-Yuan Chen and Jia-Hong Dai, “The design and synthesis using hierarchical robotic discrete-event modeling”, Journal of Vibration and Control, vol.19, pp.1603-1613, in 2013
[33] STMicroelectronics. (2016). ARM Cortex-M4 32b MCU+FPU, 225DMIPS, up to 2MB Flash/256+4KB RAM, USB OTG HS/FS, Ethernet, 17 TIMs, 3 ADCs ,20 comm. Interfaces, camera & LCD-TFT. STM32F429xx. Doc ID 024030 Rev 8.
[34] Akustica, Inc. DS32-1.04 AKU142 Data Sheet, Package type 4-pin LGA top port, Data sheet revision 1.04, Release date 19 June 2015
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top