跳到主要內容

臺灣博碩士論文加值系統

(44.211.31.134) 您好!臺灣時間:2024/07/13 01:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:蘇奕銘
研究生(外文):Yi-Ming Su
論文名稱:應用MLP、RBF及DNN類神經網路方法於中文母音辨識
論文名稱(外文):Applying MLP, RBF, and Deep Neural Networks to Mandarin Vowel Recognition
指導教授:李宗寶
指導教授(外文):Lee, Chung-Bow
口試委員:郭仁泰張國清
口試日期:2016-06-06
學位類別:碩士
校院名稱:國立中興大學
系所名稱:統計學研究所
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:英文
中文關鍵詞:機器學習語音辨識類神經網路多層感知機徑向基底函數網路深層類神經網路自動編碼器
外文關鍵詞:Machine LearningSpeech RecognitionIsolated WordNeural NetworkMLPRBFNDNNAutoencoder
相關次數:
  • 被引用被引用:4
  • 點閱點閱:1182
  • 評分評分:
  • 下載下載:217
  • 收藏至我的研究室書目清單書目收藏:0
本論文主要是探討三種類神經網路方法,對中文母音的辨識進行比較。所使用的方法有多層感知機、徑向基底函數網路及深層類神經網路。首先將錄製好的語音資料進行一系列的前處理,接著利用梅爾頻率倒頻譜係數求取語音訊號的特徵值後,採用這些特徵值作為模型的輸入資料。實驗顯示,在辨識率方面,多層感知機與深層類神經網路可得到相近的結果,並且皆優於徑向基底函數網路。我們也探討了各類神經網路方法的特性。多層感知機及徑向基底函數網路在學習初期,辨識率進步得相當快速,而深層類神經網路在經過預訓練之後,在開始學習之前,就已經擁有不錯的辨識率。為了進一步提升辨識率,我們也嘗試修改既有的演算法進行測試。結果顯示,在適當的模型設計下,母音辨識率可達九成五以上。而本論文所提出的創新模型架構亦能大幅縮短類神經網路的訓練時間。
This thesis is mainly to compare vowel recognition of mandarin isolated words using different neural network architectures including MLP, RBFN, and DNN. MFCC features extracted and preprocessed from voice signal are served as the input data. We find MLP and the pretrained version of it, DNN, are both comparable as well as superior to RBFN in terms of recognition rate. The properties of each kind of neural network are also graphed and explored. Both MLP and RBFN decrease word error rates rapidly in the early stage of learning. DNN gets a very good start after pretraining. Many tentative methods revising the standard algorithms are further conducted, trying to improve the recognition. With proper design, the speaker-dependent speech recognition rate can achieve 95.4%. Our constructive scheme of neural network also substantially shortens the training time which is an issue for deeper or wider neural networks.
1. Introduction 1
2. Data 5
2.1. Voice Signal 5
2.1.1. 1,391 Mandarin Isolated Words 5
2.1.2. Recording 5
2.2. Data Preprocessing 5
2.2.1. Digitizing 5
2.2.2. Normalization 6
2.2.3. End Point Detection 8
2.2.3.1. Energy Calculation 8
2.2.3.2. Zero Crossing Rate 9
2.2.4. Frame Blocking 9
2.2.5. Pre-Emphasis 10
2.2.6. Hamming Window 10
2.3. Feature Extraction 10
2.3.1. Discrete Fourier Transform 10
2.3.2. Triangular Bandpass Filters 11
2.3.3. Boundary Points 11
2.3.4. Log-Energy 12
2.3.5. Discrete Cosine Transform 12
2.3.6. Mel-Frequency Cepstral Coefficients 13
3. Neural Network Implementation 14
3.1. Multilayer Perceptron 14
3.1.1. Introduction 14
3.1.2. Model 15
3.1.3. Algorithm 19
3.2. Radial Basis Function Network 20
3.2.1. Introduction 20
3.2.2. Model 20
3.2.3. Algorithm 22
3.3. Deep Neural Network 24
3.3.1. Introduction 24
3.3.2. Model 25
3.3.3. Algorithm 27
3.4. Additional Methods 27
3.4.1. Momentum Term 27
3.4.2. Parameters Initialization 27
3.4.3. Randomized Input 28
3.4.4. Mini-Batch Training 28
3.4.5. Validation Set 29
3.4.6. Adaptive Learning Rate 29
3.4.7. Pocket Algorithm 30
4. Experiments 31
4.1. Programming 31
4.2. MLP versus RBFN 32
4.2.1. Recognition of Tone 1 with Single-Vowel Syllables 32
4.2.2. Recognition of All Tone 1 Syllables 34
4.2.3. Recognition of All Tones Syllables 34
4.3. MLP versus DNN 35
4.4. Hyperparameters Selection 37
4.4.1. Weight Initialization 37
4.4.2. Momentum 37
4.4.3. Learning Rate 38
4.4.4. Activation Function 39
4.5. Other Tentative Methods 41
4.6. More RBFN Results 42
4.7. More MLP Results 43
4.8. More DNN Results 44
4.9. Vowel Recognition Using New Supervised Labels 45
5. Conclusions 48
5.1. MLP Gets Good Recognition despite Simplicity 48
5.2. Both MLP and RBFN Converge Rapidly 48
5.3. DNN Outperforms MLP as Network is Designed Deeper 50
5.4. DNN Gets a Very Good Start after Pretraining 50
5.5. Revised Decoding Algorithm of Autoencoder is Favorable 52
5.6. Flexible Learning Rates Counterpoise Vanishing Gradient 52
5.7. Applying Softmax Normalization Improves Recognition 53
5.8. Constructive by-Frame Training Expedites Learning 54
5.9. RBFN is Reactive to Training 55
References 56
Appendix 59
1.林昇甫、洪成安。1993。神經網路入門與圖樣辨識。全華。
2.蘇木春、張孝德。2003。機器學習:類神經網路、模糊系統以及基因演算法則。修訂二版。全華。
3.吳明哲、黃世陽、何嘉益、張志成、吳志忠、曹祖聖。1998。Visual Basic 6.0 中文版學習範本。松崗。
4.王國榮。1999。Visual Basic 6.0 實戰講座。旗標。
5.LeCun, Y., Bottou, L., Orr, G. B. & Muller, K-R., “Efficient backProp,” in Orr, G. B. and Muller, K-R. (Eds), Neural Networks: Tricks of the trade, Springer, 1998.
6.Ahad, A., Fayyaz, A. & Mehmood, T., “Speech recognition using multilayer perceptron,” in Students Conference, 2002. ISCON '02. Proceedings, vol. 1, pp.103-109, IEEE, 2002.
7.Masmoudi, S., Chtourou, M. & Hamida, A. B., “Isolated word recognition system using MLP neural network constructive training algorithm,” in 2009 6th International Multi-Conference on Systems, Signals and Devices (SSD), pp. 1-6, IEEE, 2009.
8.Seman, N., Bakar, Z. A. & Bakar, N. A., “Measuring the performance of isolated spoken Malay speech recognition using multi-layer neural networks,” in 2010 International Conference on Science and Social Research (CSSR), pp. 182-186, IEEE, 2010.
9.Rajput, N. & Verma, S. K., “Back propagation feed forward neural network approach for speech recognition,” in 2014 3rd International Conference on Reliability, Infocom Technologies and Optimization (ICRITO 2014) - (Trends and Future Directions), pp. 1-6, IEEE, 2014.
10.Venkateswarlu, R. L. K., Kumari, R. V. & Jayasri, G. V., “Speech recognition using radial basis function neural network,” in 2011 3rd International Conference on Electronics Computer Technology (ICECT 2011), pp. 441-445, IEEE, 2011.
11.Hinton, G., Osindero, S. & The, Y. W., “A fast learning algorithm for deep belief nets,” in Neural Computation 18, pp. 1527-1554, MIT Press, 2006.
12.Bengio, Y., Lamblin, P., Popovici, D. & Larochelle, H., “Greedy layer-wise training of deep networks,” in Advances in Neural Information Processing Systems 19 (NIPS 2006), pp. 153-160, MIT Press, 2007.
13.Nguyen, Q. B., Vu, T. T. & Luong, C. M., “Improving acoustic model for English ASR system using deep neural network,” in The 2015 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future, pp. 25-19, IEEE, 2015.
14.Broomhead, D. S. & Lowe D., “Multivariate functional interpolation and adaptive networks,” in Complex Systems, vol. 2, pp. 321-355, 1988.
15.Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P. A., “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), pp. 1096-1103, ACM, 2008.
16.Bengio, Y., “Learning deep architectures for AI,” in Foundations and Trends in Machine Learning, vol. 7, no. 6, pp. 1-127, 2009.
17.Hinton, G. (2012). Neural networks for machine learning [online]. Available: https://www.coursera.org/
18.Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R., “Dropout: a simple way to prevent neural networks from overfitting,” in Journal of Machine Learning Research, vol. 15, pp. 1929-1958, MIT Press, 2014.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊