跳到主要內容

臺灣博碩士論文加值系統

(44.201.97.0) 您好!臺灣時間:2024/04/14 04:43
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:Yeshanew Ale
研究生(外文):YESHANEW ALE
論文名稱:以英文關鍵詞進行語者性別與口音辨識的混合CNN-SVM模型
論文名稱(外文):Hybrid Convolutional Neural Networks(CNN) and Support Vector Machines(SVM) Model for Speakers’ Gender and Accent Recognition using English Keywords
指導教授:練光祐
指導教授(外文):KUANG-YOW LIAN
口試委員:練光祐黃正民劉寅春
口試委員(外文):KUANG-YOW LIANCHENG-MING HUANGPETER LIU
口試日期:2019-01-22
學位類別:碩士
校院名稱:國立臺北科技大學
系所名稱:電機工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:54
外文關鍵詞:Accent RecognitionSVMCNNhybrid CNN-SVMSpectrogramGender accentASRDropoutdata augmentationOverfittingKeywords recognition
相關次數:
  • 被引用被引用:0
  • 點閱點閱:216
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
Nowadays, the speakers’ accent recognition, speech to text conversion and their applications are becoming popular research areas all over the world. Although some countries’ people speak English as native, almost all except USA, Australia, United Kingdom, and some other countries’ people speak the English language as non-native. The accent of speakers has great influence during communication between different countries’ people to each other. This thesis discusses the speakers’ accent recognition, gender accent recognition and isolated keyword recognition using keywords’ voice of non-native English language speakers. Identification of the accent similarity between non-native English speaker countries and native English speakers is also one part of this research. The accent similarity or gap between 3 countries’ people accent, namely, Ethiopian, Indian and Taiwanese also investigated. In this work, a hybrid model of Convolutional Neural Networks (CNN) and Support Vector Machine (SVM) on the voice recognition applications is proposed. The spectrograms of speech signal feature extractor are pre features extractor, CNN model as post features extractor and SVM as the classifier. In this work, after the voice is converted to spectrogram the speech recognition technique followed the working principle of image recognition. The CNN extracts features from a spectrogram image representation of speech and SVM is applied to extracted features. The performance of the hybrid CNN-SVM model is evaluated by comparing it to CNN and SVM models alone. The fusion model of CNN-SVM converges fast and reduce the overfitting problem unlike to CNN model alone. The result shows that the proposed system carried out multiple tasks at the same time and achieved high recognition accuracy.


ABSTRACT i
ACKNOWLEDGMENTS iii
List of Tables vi
List of Figures vii
Chapter 1 Introduction 1
1.1 Introduction 1
1.2 Motivation 2
1.3 Speech Recognition’s Algorithms 3
1.4 Major contributions 5
1.5 Organization of the Thesis 6
Chapter 2 Literature Review 7
2.1 Introduction 7
2.2 Related Works 7
Chapter 3 Proposed System Design and Discussion 10
3. 1 Dataset Preparation 10
3.2 Feature Extraction 12
3.2.1. Feature Extraction Parameters 15
3.2.2 Grayscale representation of spectrogram 18
3.3 Preprocess Techniques 19
3.4 Neural Network Overview 22
3.5 CNN Overviews 25
3.6 System Architectures and Discussion 27
3.6.1 CNN 27
3.6.2 Detail Discussion of each layer of CNN- model 28
3.7 SVM classification algorithm 34
3.8 Hybrid CNN - SVM classification algorithm 36
Chapter 4 Experimental Results 38
4.1 Experimental Results 38
4.1.1 Keywords’ voice recognition results 38
4.1.2 Speakers’ accent recognition using Keywords 41
4.1.3 Gender accent recognition using Keywords 44
4.1.4 Native and non-native English speakers accent similarity 49
4.1.5 Training time comparison 50
Chapter 5 Conclusion and Future works 51
5.1 Conclusion 51
5.2 Future works 51
References: 52


[1]Registrar General, India. "Census of India 2011: provisional population totals-India data sheet." Office of the Registrar General Census Commissioner, India. Indian Census Bureau(2011).
[2]W. V. Chiung "Language and ethnic identity in Taiwan." In 7th Annual North American Taiwan Studies Conference, June, pp. 23-25. 2001.
[3]S. J. Arora, and R. P. Singh. "Automatic speech recognition: a review." International Journal of Computer Applications 60, no. 9 (2012).
[4]P. Saini, and P. Kaur. "Automatic speech recognition: A review." International Journal of Engineering Trends and Technology 4, no. 2 (2013): 1-5.
[5]Y. Ma, M. Paulraj, S. Yaacob, A. Shahriman, and S. K. Nataraj. "Speaker accent recognition through statistical descriptors of Mel-bands spectral energy and neural network model." In Sustainable Utilization and Development in Engineering and Technology (STUDENT), 2012 IEEE Conference on, pp. 262-267. IEEE, 2012.
[6]J. Vajpai, and A. Bora. "Industrial Applications of Automatic Speech Recognition Systems." International Journal of Engineering Research and Applications 6, no. 3 (2016): 88-95.
[7]E. Tverdokhleb, H. Dobrovolskyi, N. Keberle, and N. Myronova. "Implementation of accent recognition methods subsystem for eLearning systems." In Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), 2017 9th IEEE International Conference on, vol. 2, pp. 1037-1041. IEEE, 2017.
[8]J. Padmanabhan, and M. J. J. Premkumar. "Machine learning in automatic speech recognition: A survey." IETE Technical Review 32, no. 4 (2015): 240-251.
[9]A. Bhandare, M. Bhide, P. Gokhale, and R. Chandavarkar. "Applications of Convolutional Neural Networks." International Journal of Computer Science and Information Technologies (2016): 2206-2215.
[10]C. Teixeira, I. Trancoso, and A. Serralheiro. "Accent identification." In Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, vol. 3, pp. 1784-1787. IEEE, 1996.
[11]V. Boddapati, A. Petef, J. Rasmusson, and L. Lundberg. "Classifying environmental sounds using image recognition networks." Procedia Computer Science 112 (2017): 2048-2056.
[12]S. K. Gouda, S. Kanetkar, D. Harrison, and M. K. Warmuth. "Speech Recognition: Keyword Spotting Through Image Recognition." arXiv preprint arXiv:1803.03759(2018).
[13]M. Elleuch, R. Maalej, and M. Kherallah. "A new design based-SVM of the CNN classifier architecture with dropout for offline Arabic handwritten recognition." Procedia Computer Science 80 (2016): 1712-1723.
[14] S. Deshpande, S. Chikkerur, and V. Govindaraju. "Accent classification in speech." In Automatic Identification Advanced Technologies, 2005. Fourth IEEE Workshop on, pp. 139-143. IEEE, 2005.
[15]X. Niu. "Fusions of CNN and SVM Classifiers for Recognizing Handwritten Characters." Ph.D. diss., Concordia University, 2011.
[16]L. M. Arslan, and J. H. Hansen. "Language accent classification in American English." Speech Communication18, no. 4 (1996): 353-367.
[17]D. Dhanashri, and S. B. Dhonde. "Isolated Word Speech Recognition System Using Deep Neural Networks." In Proceedings of the International Conference on Data Engineering and Communication Technology, pp. 9-17. Springer, Singapore, 2017.
[18] K. Saeed, and M.K. Nammous. "A speech-and-speaker identification system: feature extraction, description, and classification of the speech-signal image." IEEE transactions on industrial electronics 54, no. 2 (2007): 887-897.
[19] R. N. Tak, D. M. Agrawal, and H. A. Patil. "Novel Phase Encoded Mel Filterbank Energies for Environmental Sound Classification." In International Conference on Pattern Recognition and Machine Intelligence, pp. 317-325. Springer, Cham, 2017.
[20]A. M. Noelia. "Speech analysis for automatic speech recognition." Master's thesis, Institutt for elektronikk og telekommunikasjon, 2009.
[21]F. Longueira, and S. Keene. "A Fully Convolutional Neural Network Approach to End-to-End Speech Enhancement." arXiv preprint arXiv:1807.07959 (2018).
[22]J. Levis, and R. Suvorov. "Automatic speech recognition." The encyclopedia of applied linguistics(2012).
[23] G. E. Nasr, E. A. Badr, and C. Joun. "Cross entropy error function in neural networks: Forecasting gasoline demand." In FLAIRS Conference, pp. 381-384. 2002..
[24]N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. "Dropout: a simple way to prevent neural networks from overfitting." The Journal of Machine Learning Research 15, no. 1 (2014): 1929-1958.
[25]J. S. Prakash, K. A. Vignesh, C. Ashok, and R. Adithyan. "Multiclass Support Vector Machines classifier for machine vision application." In Machine Vision and Image Processing (MVIP), 2012 International Conference on, pp. 197-199. IEEE, 2012.
[26]S. Amarappa, and S. V. Sathyanarayana. "Data classification using Support Vector Machine (SVM), a simplified approach." International Journal of Electronics and Computer Science Engineering. ISSN-2277-1956 (2014).
[27]B. T. Smith "Lagrange multipliers tutorial in the context of support vector machines." Memorial University of Newfoundland St. John’s, Newfoundland, Canada (2004).


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊