跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.169) 您好!臺灣時間:2025/03/20 17:20
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:徐筱茵
研究生(外文):XU,XIAO-YIN
論文名稱:使用帶有門控循環單元的卷積神經網絡 進行構音障礙檢測
論文名稱(外文):Dysarthric-Speech Detection Using Convolutional Neural Networks with Gate Recurrent Unit
指導教授:施東河施東河引用關係
指導教授(外文):SHIH,DONG-HER
口試委員:廖則竣鄭國順
口試委員(外文):LIAO, CHE-CHENCHENG, KUO-SHENG
口試日期:2022-06-24
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:資訊管理系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2022
畢業學年度:110
語文別:中文
論文頁數:29
中文關鍵詞:構音障礙深度學習卷積神經網絡門控循環單元
外文關鍵詞:DysarthriaDeep LearningConvolutional Neural NetworkGate Recurrent Unit
相關次數:
  • 被引用被引用:0
  • 點閱點閱:184
  • 評分評分:
  • 下載下載:27
  • 收藏至我的研究室書目清單書目收藏:0
近年來,由於人口數量的攀升加上高齡化,神經系統疾病的患病率也逐年攀升,其中帕金森氏症、中風、腦癱等神經系統患者常常出現構音障礙的症狀。這些構音障礙患者如果沒有及時檢測並提早干預,極易造成病程管理的困難,當症狀加重時也會影響患者的心理和生理。在過去構音障礙檢測的研究中,大多使用機器學習或者是深度學習的卷積神經網路模型作為分類模型,本研究提出利用卷積神經網路加入門控循環單元的CNN-GRU混合模型進行構音障礙之檢測,研究結果顯示本研究提出的構音障礙檢測CNN-GRU混合模型,有高達98.88%的準確率,優於其他研究模型。
In recent years, due to the increase in population and aging, the prevalence of neurological diseases is also increasing year by year, among which patients with Parkinson's disease, stroke, cerebral palsy, and other neurological diseases often show symptoms of dysarthria. If these patients with dysarthria are not detected in time and intervened in advance, it is easy to cause difficulties during disease management, and when the symptoms worsen, it will seriously affect the patients' psychological and physiological. Machine learning or deep learning convolutional neural network model is mostly used as a classification model in the previous research on the detection of Dysarthric-speech Detection; in this study, CNN-GRU hybrid model with convolutional neural network and gated loop unit was proposed to detect Dysarthric disorder. The results show that the accuracy of the CNN-GRU hybrid model proposed in this study is up to 98.88%, which is better than other research models.
摘要 i
Abstract ii
目錄 iii
1.緒論 1
2.文獻回顧 3
2.1構音障礙 3
2.2訊號處理 4
2.2.1傅立葉轉換 4
2.2.2短時傅立葉轉換 5
2.3特徵提取 6
2.3.1梅爾頻率倒譜係數 6
2.3.2線性預測倒譜係數 7
2.4深度學習 7
2.4.1卷積神經網路 8
2.4.2長短期記憶 8
2.4.3門控循環單元 9
2.4.4 CNN-GRU 9
3. 研究方法 11
3.1信號前處理 11
3.2特徵提取 11
3.3 CNN模型 12
3.4 LSTM 模型 13
3.5 CNN-LSTM模型 13
3.6 CNN-GRU模型 13
4. 實驗應用 14
4.1數據集 14
4.2評估指標 14
4.3實驗結果 15
4.3.1 CNN模型實驗結果 16
4.3.2 LSTM模型實驗結果 16
4.3.3 CNN-LSTM模型實驗結果 16
4.3.4 CNN-GRU模型實驗結果 17
4.3.5 與其他研究方法的性能比較 17
5.結論 18
參考文獻 19

1.Ankışhan, H., & İnam, S. Ç. (2021). Voice pathology detection by using the deep network architecture. Applied Soft Computing, 106, 107310.
2.Alan V.Oppenheim, Ronald W.Schafer and John R. Buck, Discrete-Time Signal Processing Second Edition,Pearson Prentice Hall,2005
3.Bhattacharyya, N. (2014). The prevalence of voice problems among adults in the United States. The Laryngoscope, 124(10), 2359-2362.
4.Chandrashekar, H. M., Pavithra, K. S., Karjigi, V., & Sreedevi, N. (2021, February). Region based prediction and score combination for automatic intelligibility assessment of dysarthric speech. In 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) (pp. 407-412). IEEE.
5.Chaiani, M., Selouani, S. A., Boudraa, M., & Yakoub, M. S. (2022). Voice disorder classification using speech enhancement and deep learning models. Biocybernetics and Biomedical Engineering, 42(2), 463-480.
6.Chung, J., Gulcehre, C., Chung, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
7.Dibazar, A. A., Narayanan, S., & Berger, T. W. (2002, October). Feature analysis for automatic detection of pathological speech. In Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society][Engineering in Medicine and Biology (Vol. 1, pp. 182-183). IEEE
8.Dickson, S., Barbour, R. S., Brady, M., Clark, A. M., & Paton, G. (2008). Patients' experiences of disruptions associated with post‐stroke dysarthria. International Journal of Language & Communication Disorders, 43(2), 135-153.
9.Duffy, J. R. (2019). Motor speech disorders e-book: Substrates, differential diagnosis, and management. Elsevier Health Sciences. .
10.Durak, L., & Arikan, O. (2003). Short-time Fourier transform: two fundamental properties and an optimal implementation. IEEE Transactions on Signal Processing, 51(5), 1231-1242.
11.Dumane, P., Hungund, B., & Chavan, S. (2021). Dysarthria Detection Using Convolutional Neural Network. In Techno-Societal 2020 (pp. 449-457): Springer.
12.Dumane, Pratibha, Bilal Hungund, and Satishkumar Chavan. "Dysarthria Detection Using Convolutional Neural Network." Techno-Societal 2020. Springer, Cham, 2021. 449-457.
13.Enderby, P. (2013). Disorders of communication: dysarthria. Handbook of clinical neurology, 110, 273-281.
14.Fan, Z., Qian, J., Sun, B., Wu, D., Xu, Y., & Tao, Z. (2020, October). Modeling Voice Pathology Detection Using Imbalanced Learning. In 2020 International Conference on Sensing, Measurement & Data Analytics in the era of Artificial Intelligence (ICSMD) (pp. 330-334). IEEE
15.Fawcett, T. (2006). An introduction to ROC analysis. Pattern recognition letters, 27(8), 861-874.
16.Gentil, M., Pollak, P., & Perret, J. (1995). Parkinsonian dysarthria. Revue neurologique, 151(2), 105-112.
17.Gers, F. A., & Schmidhuber, E. (2001). LSTM recurrent networks learn simple context-free and context-sensitive languages. IEEE Transactions on Neural Networks, 12(6), 1333-1340.
18.Goetz, C. G., Stebbins, G. T., Wolff, D., DeLeeuw, W., Bronte‐Stewart, H., Elble, R., ... & Taylor, C. B. (2009). Testing objective measures of motor impairment in early Parkinson's disease: Feasibility study of an at‐home testing device. Movement Disorders, 24(4), 551-556.
19.Goutte, C., & Gaussier, E. (2005, March). A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In European conference on information retrieval (pp. 345-359). Springer, Berlin, Heidelberg.
20.Hasannezhad, M., Ouyang, Z., Zhu, W. P., & Champagne, B. (2020, December). An integrated CNN-GRU framework for complex ratio mask estimation in speech enhancement. In 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (pp. 764-768). IEEE.
21.Hammam, H., El-Shafai, W., Hassan, E., Abu El-Azm, A. E., Dessouky, M. I., Elhalawany, M. E., ... & Fathi, E. (2021). Blind signal separation with Noise Reduction for efficient speaker identification. International Journal of Speech Technology, 24(1), 235-250.
22.Hammami, I., Salhi, L., & Labidi, S. (2020). Voice pathologies classification and detection using EMD-DWT analysis based on higher order statistic features. Irbm, 41(3), 161-171.
23.Hernandez, A., & Chung, M. (2019). Dysarthria classification using acoustic properties of fricatives. Proceedings of SICSS, 2019, 16.
24.Karan, B., Sahu, S. S., Mahto, K. J. B., & Engineering, B. (2020). Parkinson disease prediction using intrinsic mode function based features from speech signal. 40(1), 249-264.
25.Kodrasi, I. (2021). Temporal envelope and fine structure cues for dysarthric speech detection using CNNs. IEEE Signal Processing Letters, 28, 1853-1857.
26.Kent, R. D., Weismer, G., Kent, J. F., Vorperian, H. K., & Duffy, J. R. (1999). Acoustic studies of dysarthric speech: Methods, progress, and potential. Journal of communication disorders, 32(3), 141-186.
27. King, B. J., & Atlas, L. (2011). Single-channel source separation using complex matrix factorization. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2591-2597.
28.Marmor, S., Horvath, K. J., Lim, K. O., & Misono, S. (2016). Voice problems and depression among adults in the U nited S tates. The Laryngoscope, 126(8), 1859-1864.
29.Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv: . 1003.4083
30.Marras, C., Beck, J., Bower, J., Roberts, E., Ritz, B., Ross, G., . . . Willis, A. J. N. P. s. d. (2018). Prevalence of Parkinson’s disease across North America. 4(1), 1-7.
31.Moro-Velazquez, L., Gómez-García, J. A., Godino-Llorente, J. I., Villalba, J., Orozco-Arroyave, J. R., & Dehak, N. (2018). Analysis of speaker recognition methodologies and the influence of kinetic changes to automatically detect Parkinson's Disease. Applied Soft Computing, 62, 649-666.
32.Molau, S., Pitz, M., Schluter, R., & Ney, H. (2001, May). Computing mel-frequency cepstral coefficients on the power spectrum. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (cat. No. 01CH37221) (Vol. 1, pp. 73-76). IEEE.
33.Muhammad, G., Alsulaiman, M., Ali, Z., Mesallam, T. A., Farahat, M., Malki, K. H., ... & Bencherif, M. A. (2017). Voice pathology detection using interlaced derivative pattern on glottal source excitation. Biomedical signal processing and control, 31, 156-164.
34.Narendra, N. P., & Alku, P. (2019). Dysarthric speech classification from coded telephone speech using glottal features. Speech Communication, 110, 47-55.
35.Narendra, N. P., & Alku, P. (2020). Glottal source information for pathological voice detection. IEEE Access, 8, 67745-67755.
36.LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
37.Rudzicz, F. (2007, October). Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech. In Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility (pp. 255-256).
38.Rajeswari, R., Devi, T., & Shalini, S. (2022). Dysarthric Speech Recognition Using Variational Mode Decomposition and Convolutional Neural Networks. Wireless Personal Communications, 122(1), 293-307.
39.Rampello, L., Rampello, L., Patti, F., & Zappia, M. (2016). When the word doesn't come out: A synthetic overview of dysarthria. Journal of the neurological sciences, 369, 354-360.
40.Jing, X., Ma, J., Zhao, J., & Yang, H. (2014, August). Speaker recognition based on principal component analysis of LPCC and MFCC. In 2014 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) (pp. 403-408). IEEE.
41.Jeancolas, L., Benali, H., Benkelfat, B. E., Mangone, G., Corvol, J. C., Vidailhet, M., ... & Petrovska-Delacrétaz, D. (2017, May). Automatic detection of early stages of Parkinson's disease through acoustic voice analysis with mel-frequency cepstral coefficients. In 2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) (pp. 1-6). IEEE.
42.Janbakhshi, P., & Kodrasi, I. (2022, May). Experimental investigation on STFT phase representations for deep learning-based dysarthric speech detection. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6477-6481). IEEE.
43.Pennington, L., Parker, N. K., Kelly, H., & Miller, N. (2016). Speech therapy for children with dysarthria acquired before three years of age. Cochrane Database of Systematic Reviews, (7).
44.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
45.Schlauch, R. S., Anderson, E. S., & Micheyl, C. (2014). A demonstration of improved precision of word recognition scores. Journal of Speech, Language, and Hearing Research, 57(2), 543-555.
46.Saldanha, J. C., & Suvarna, M. (2020). Perceptual linear prediction feature as an indicator of dysphonia. In Advances in Control Instrumentation Systems (pp. 51-64). Springer, Singapore.
47.Spangler, T., Vinodchandran, N. V., Samal, A., & Green, J. R. (2017, February). Fractal features for automatic detection of dysarthria. In 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) (pp. 437-440). IEEE.
48.Sripriya, N., Poornima, S., Shivaranjani, R., & Thangaraju, P. (2017, January). Non-intrusive technique for pathological voice classification using jitter and shimmer. In 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP) (pp. 1-6). IEEE.
49.Sekhar, S. M., Kashyap, G., Bhansali, A., & Singh, K. (2021). Dysarthric-speech detection using transfer learning with convolutional neural networks. ICT Express.
50.Sohn, J., Kim, N. S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE signal processing letters, 6(1), 1-3.
51.Tripathi, S., Batra, S., & Pandey, S. (2019, November). Unbiased Mortality Prediction for Unbalanced Data Using Machine Learning. In 2019 International Conference on Electrical, Electronics and Computer Engineering (UPCON) (pp. 1-5). IEEE.
52.Tavares, T. R., Oliveira, A. L., Cabral, G. G., Mattos, S. S., & Grigorio, R. (2013, August). Preprocessing unbalanced data using weighted support vector machines for prediction of heart disease in children. In The 2013 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
53.Upadhya, S. S., Cheeran, A. N., & Nirmal, J. H. (2017, February). Statistical comparison of Jitter and Shimmer voice features for healthy and Parkinson affected persons. In 2017 second international conference on electrical, computer and communication technologies (ICECCT) (pp. 1-6). IEEE.
54.Vashkevich, M., Rushkevich, Y. J. B. S. P., & Control. (2021). Classification of ALS patients based on acoustic analysis of sustained vowel phonations. 65, 102350.
55.Van Nuffelen, G., Middag, C., De Bodt, M., & Martens, J. P. (2009). Speech technology‐based assessment of phoneme intelligibility in dysarthria. International journal of language & communication disorders, 44(5), 716-730.
56.Vasilev, I., Slater, D., Spacagna, G., Roelants, P., & Zocca, V. (2019). Python Deep Learning: Exploring deep learning techniques and neural network architectures with Pytorch, Keras, and TensorFlow. Packt Publishing Ltd.
57.Wong, E., & Sridharan, S. (2001, May). Comparison of linear prediction cepstrum coefficients and mel-frequency cepstrum coefficients for language identification. In Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No. 01EX489) (pp. 95-98). IEEE.
58.Yang, X., Tan, B., Ding, J., Zhang, J., & Gong, J. (2010, June). Comparative study on voice activity detection algorithm. In 2010 International Conference on Electrical and Control Engineering (pp. 599-602). IEEE.
59.Yujin, Y., Peihua, Z., & Qun, Z. (2010, October). Research of speaker recognition based on combination of LPCC and MFCC. In 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems (Vol. 3, pp. 765-767). IEEE.
60.Zbancioc, M., & Costin, M. (2003, July). Using neural networks and LPCC to improve speech recognition. In Signals, Circuits and Systems, 2003. SCS 2003. International Symposium on (Vol. 2, pp. 445-448). IEEE.
61.Zhang, R., Li, P. H., Liang, K. W., & Chang, P. C. (2021, September). Voice Activity Detection by Jo1i nt MRCG and MFCC Features with Robustness Detection based GRU Networks. In 2021 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW) (pp. 1-2). IEEE.
62.Zia, T., & Zahid, U. (2019). Long short-term memory recurrent neural network architectures for Urdu acoustic modeling. International Journal of Speech Technology, 22(1), 21-30.
63.Zhang, D. (2019). Wavelet transform. In Fundamentals of Image Data Mining (pp. 35-44). Springer, Cham.
64.Zahid, L., Maqsood, M., Durrani, M. Y., Bakhtyar, M., Baber, J., Jamal, H.,Song, O.-Y. J. I. A. (2020). A spectrogram-based deep feature assisted computer-aided diagnostic system for Parkinson’s disease. 8, 35482-35495. IEEE.



QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊