跳到主要內容

臺灣博碩士論文加值系統

(44.201.92.114) 您好!臺灣時間:2023/03/28 05:03
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳霽穎
研究生(外文):Ji-Ying Chen
論文名稱:結合聲學訊號和病史紀錄之嗓音疾病分類演算法
論文名稱(外文):Combining Acoustic Signals and Medical Record for Improved Pathological Voice Classification
指導教授:方士豪方士豪引用關係
指導教授(外文):Shih-Hau Fang
口試委員:曹昱賴穎暉王棨德王緒翔
口試委員(外文):Yu TsaoYing-Hui LaiChi-Te WangSyu-Siang Wang
口試日期:2019-1-18
學位類別:碩士
校院名稱:元智大學
系所名稱:電機工程學系甲組
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:112
中文關鍵詞:病理聲音疾病分類聲音訊號病史紀錄人工智慧
外文關鍵詞:Pathological voicediseases classificationacoustic signalmedical recordartificial intelligence
相關次數:
  • 被引用被引用:0
  • 點閱點閱:238
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
隨著個人的健康照護服務越來越受重視,許多在人體中可產生出的生理訊號與人工智慧結合的研究逐漸增加,例如:心電訊號、呼吸、嗓音等等,其中本文採用嗓音檢測進行分析,而嗓音檢測是當喉嚨內有長出異物或是肌肉無法長時間發聲的時候提出警告,之後嘗試分析患者可能病種的系統,此系統引起多數國家的興趣,進一步納入智慧健康照護系統中的一環。 本文的研究提出使用深度學習的方法來分類三項常見嗓音疾病,深度神經網絡 (DNN) 的神經元可以使用較多的隱藏層且具有優化權重的特性,所以能有效地建構擁有非線性的聲學模型。

本研究的語音資料庫由亞東醫院耳鼻喉科建立,疾病分類採用 589 個語音樣本,其中包含三種常見語音障礙為聲帶腫瘤、嗓音誤用 (如息肉、結節及囊腫) 和聲帶麻痺,加入三種基於倒譜聲學的特徵進行驗證,以獲得正確率 76.94% 和未加權平均召回率 (unweighted average recall, UAR) 64.25%。 同時也使用病史紀錄進行驗證,以獲得正確率 81.56% 和 UAR 73.65%。 此資料庫的實驗結果顯示本文此系統分類架構的可行性,與多種傳統機器學習演算法比較,僅證實深度神經網絡優於高斯混合模型、支持向量機、決策樹和最近鄰居法等四種常用的分類器,但可使用聲音與病史結合來改進結果。

為了利用聲學訊號和病史紀錄這兩種特徵的優勢,提出兩種基於深度神經網路的方法以提升分類效果,稱為超向量 (supervector) 和融合學習 (fusion learning)。 前者通過高斯混合模型將動態聲音轉換為靜態超向量,在特徵處理方面可以容易地與病史紀錄結合。 後者導入兩階段深度神經網絡,並根據第一階段的局部結果與原始訊號進入第二階段處理。 實驗清楚地顯示,本實驗提出融合特徵優於任何單一特徵。 本研究提出的超向量、融合學習跟單一特徵 (聲學訊號、病史紀錄) 相比,分別將正確率和 UAR 提高 2.02 - 10.32% 和 2.48 - 17.31%。

除此之外,本研究也會將聲音與病史採用圖像的形式作為輸入特徵,而且會使用卷積神經網路 (CNN) 取代 DNN 進行分類,有別於過去研究使用聲學訊號進行分類,獲得頻譜圖正確率 80.52% 以及 UAR 74.84%,獲得病史圖像正確率 60.88% 以及 UAR 41.81%,後續只用傳統的特徵與模型結合方法來改進,前者是將病史重複聲音的音框數量後,之後進行並聯後訓練。 後者將兩種輸出分數乘上權重再相加,使用最大的分數進行診斷。 實驗顯示,本研究只有特徵結合與單一特徵相比,使用 CNN 可將正確率和 UAR 提高 1.38% - 21.02% 和 2.33 - 35.36%。
With more and more importance for personal healthcare services, the researches have been increased gradually, which combine many producible physiological signals in human body with machine learning, for example, ECG, breathing, and voice signals. One of these reasearches is the detection of pathological voice to analyze for this study. Detection of pathological voice is a system, proposing warning when there is prominence in the throat or isn't capable of speaking for a long time, and then analyze the possible disease for the patients. This system is attracted interest from many countries, and further incorporated into one technology of smart healthcare systems. The paper proposes a deep-learning-used method to classify three common pathlogical voice. Tthe neuron of deep neural network (DNN) can make use of multilayer and have the characteristic with optimized weights, so DNN is able to construction efficiently acoustic model with nonlinearity.

The database used in this study was set up from the department of Otolaryngology Head and Neck Surgery, Far Eastern Memorial Hospital (FEMH). This study applies 589 samples for classification of pathological voice, including three common categroies, glottic neoplasm, phonotraumatic lesions, and vocal paralysis. The results based on three features based on cepstral acoustics were validated. The highest accuarcy was 76.94%,and UAR (unweighted average recall) was 64.25% for acoustic signals. Besides, we used medical record to validate. The highest accuarcy was 81.56%,and UAR was 73.65% for medical record. The experimental results used on this database show the feasibility of architecture of classification system in this paper. Comparing the experimental results with many traditional machine learning algorithms, the results only confirm DNN improve the four commonly-used classifiers, namely, Gaussian Mixture models (GMMs), support vector machine (SVM), decision tree (DT), and k-nearest neighbor (KNN). However, it needs to improve performance with combining voice and medical record.

To utilize the advantage of two features both acoustic signals and medical record, this paper proposes two deep-learning-based methods to improve the performance of classification, called supervector and fusion learning. The former transforms the dynamic acoustic waves into a static supervector via Gaussian mixture models that can be easily combined with the medical record. The latter embeds a two-stage deep neural network and iteratively refines the fusion process according to the local estimation from the first stage and the original signal statistics. Experiments clearly demonstrated that the proposed fusion approaches outperform any individual information source. The proposed supervector and fusion learning algorithms improves the accuracy and UAR by 2.02 - 10.32% and 2.48 - 17.31%, respectively, compared to acoustic signals and medical record.

Besides, the study uses the pictures as the input feature with the acoustic signals and medical history, and replaces DNN with Convolutional Neural Network (CNN) for classification. This study is different from the past studies using acoustic signals for classification, get accuracy 80.52% and UAR 74.84% for spectrum, amd accuracy 60.88% and UAR 41.81% for medical record image. Then, the paper only use feature-based and model-based combination to improve the original feature. The former explains that the history repeat the number of vocal frame, and concatenate the feature to train a model. The latter is to combine outputs of the model with certain weight for two features, and sum the scores to diagnoise the diseases. Experiments demonstrated that the feature-based combination compares with original features, it can improve the accuracy and UAR by 1.38 - 21.02% and 2.33 - 35.36% with CNN.
書名頁 I
論文口試委員審定書 II
中文摘要 IV
英文摘要 VI
誌謝 VIII
Contents IX
List of Tables XII
List of Figures XV
Chapter 1. 緒論 2
1.1 研究動機與目的 2
1.2 相關文獻 3
1.3 論文架構 6
Chapter 2. 理論基礎 7
2.1 梅爾頻率倒譜系數 (Mel-scale Frequency Cepstral Coefficients, MFCC) 7
2.2 傳統機器學習演算法 12
2.2.1 支持向量機 (Support Vector Machine, SVM) 12
2.2.2 高斯混合模型 (Gaussian Mixture Model, GMM) 13
2.2.3 最近鄰居法 (K-Nearest Neighbor, KNN) 14
2.2.4 決策樹 (Decision Tree) 15
2.3 深度神經網路 (Deep Neural Network, DNN) 16
2.4 卷積神經網路 (Convolution Neural Network, CNN) 19
Chapter 3. 資料描述與實驗設置 24
3.1 資料描述 24
3.1.1 語音資料庫 24
3.1.2 病史紀錄資料庫 26
3.2 實驗設置與效能指標說明 37
Chapter 4. 提出深度學習的結合方法 37
4.1 特徵的多模態處理 37
4.1.1 基於特徵之結合 (Feature-based Combination) 37
4.1.2 超向量 (Supervector) 38
4.2 模型的多模態處理 39
4.2.1 基於模型之結合 (Model-based Combination) 39
4.2.2 融合學習 (Fusion learning) 40
4.3 CNN 之進一步改良 41
4.3.1 基於特徵之結合 (Feature-based Combination) 44
4.3.2 基於模型之結合 (Model-based Combination) 44
Chapter 5. 嗓音病理結果分析 45
5.1 病理分類結果 45
5.1.1 聲學訊號結果比較 45
5.1.2 病史記錄結果比較 49
5.1.3 多模態方法比較 52
5.2 病理嗓音訊號分析 55
5.3 基於頻譜圖之病理分類的結果 60
Chapter 6. 結論與未來展望 64
Bibliography 65
附錄甲:病史統計、特徵表示與問卷內容 77
附錄乙:聲學訊號的參數調整與分類器比較 80
附錄丙:病史紀錄的參數調整與分類器比較 85
附錄丁:基於特徵之結合的參數調整 91
附錄戊:基於模型之結合的參數調整 92
附錄己:超向量的參數調整 94
附錄庚:融合學習的參數調整 97
附錄辛:誤判人數分布 102
附錄壬:卷積神經網路 111
[1] Detection of pathological voice using cepstrum vectors: A deep learning approach.
[2] A voice activity detector using SVM and Naïve Bayes classification algorithm.
[3] Voice Pathology Detection Using Deep Learning on Mobile Healthcare Framework.
[4] Neural nilm: Deep neural networks applied to energy disaggregation.
[5] Deep learning for estimating building energy consumption
[6] Transfer learning using convolutional neural networks for object classification within X-ray baggage security imagery.
[7] Airline passenger profiling based on fuzzy deep machine learning.
[8] Benchmarking state-of-the-art deep learning software tools.
[9] Comparative study of deep learning software frameworks.
[10] Deep Architecture for Traffic Flow Prediction: Deep Belief Networks With Multitask Learning.
[11] Traffic flow prediction with big data: A deep learning approach.
[12] 基於機器學習之病理嗓音偵測與分類。
[13] Demographic and Symptomatic Features of Voice Disorders and Their Potential Application in Classification Using Machine Learning Algorithms.
[14] The prevalence, diagnosis, and management of voice disorders in a National Ambulatory Medical Care Survey (NAMCS) cohort.
[15] Prevalence and causes of dysphonia in a large treatment-seeking population.
[16] Quality-of-life impact of non-neoplastic voice disorders: a meta-analysis.
[17] Voice-related symptoms and their effects on quality of life.
[18] Clinical practice guideline: hoarseness (dysphonia)(update).
[19] Healthcare big data voice pathology assessment framework.
[20] An optimum algorithm in pathological voice quality assessment using wavelet-packet-based features, linear discriminant analysis and support vector machine.
[21] Using modulation spectra for voice pathology detection and classification.
[22] Pathological voices detection using Support Vector Machine.
[23] Voice Disorder Identification by Using Machine Learning Techniques.
[24] Automatic detection of pathological voices using complexity measures, noise parameters, and mel-cepstral coefficients.
[25] Vocal fold disorder detection based on continuous speech by using MFCC and GMM.
[26] Acoustic Analysis for Detection of Voice Disorders Using Adaptive Features and Classifiers.
[27] Vocal Folds Disorder Detection using Pattern Recognition Methods.
[28] Robustness against the channel effect in pathological voice detection.
[29] A deep learning method for pathological voice detection using convolutional deep belief networks.
[30] Convolutional neural networks for pathological voice detection.
[31] Edge Computing with Cloud for Voice Disorder Assessment and Treatment.
[32] Speech LAB, Disordered voice database model 4337 (Ver. 1.03), ed.
[33] Organic voice pathology classification.
[34] Classification of voice disorders using i-Vector analysis.
[35] Glottal signal parameters as features set for neurological voice disorders diagnosis using K-Nearest Neighbors (KNN).
[36] Vocal folds pathologies classification using Naïve Bayes Networks.
[37] Voice pathology distinction using autoassociative neural networks.
[38] Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions.
[39] Clinical voice pathology: Theory and management.
[40] Shifts in relative prevalence of laryngeal pathology in a treatment-seeking population.
[41] Epidemiological study of voice disorders among teaching professionals of La Rioja, Spain.
[42] The prevalence of laryngeal pathology in a treatment-seeking population with dysphonia.
[43] School teachers’ vocal use, risk factors, and voice disorder prevalence: guidelines to detect teachers with current voice problems.
[44] Populations in the US workforce who rely on voice as a primary tool of trade: a preliminary report.
[45] Occupational groups at risk of voice disorders: a review of the literature.
[46] Interaction between tobacco and alcohol use and the risk of head and neck cancer: pooled analysis in the International Head and Neck Cancer Epidemiology Consortium.
[47] Development and validation of the voice handicap index-10.
[48] Correlation of VHI-10 to voice laboratory measurements across five common voice disorders.
[49] Voice disorders: etiology and diagnosis.
[50] Perceptual audio hashing functions.
[51] Speech Recognition Using Long-Span Temporal Patterns in a Deep Network Model.
[52] Individualization of music similarity perception via feature subset selection.
[53] Feature extraction of some Quranic recitation using Mel-Frequency Cepstral Coeficients (MFCC).
[54] Discrete mutative particle swarm optimisation of MFCC computation for classifying hypothyroidal infant cry.
[55] Optimization of MFCC parameters using Particle Swarm Optimization for diagnosis of infant hypothyroidism using Multi-Layer Perceptron.
[56] Automatic detection of snoring events using Gaussian mixture models.
[57] Comparative analysis of LPCC, MFCC and BFCC for the recognition of Hindi words using artificial neural networks.
[58] The nature of statistical learning theory Springer New York Google Scholar.
[59] Classification of normal and pathological voice using svm and rbfnn.
[60] SVM-based identification of pathological voices.
[61] Speaker verification using adapted Gaussian mixture models.
[62] 高斯混合模型 (GMM) 介紹以及學習筆記。
[63] Statistical decision-tree models for parsing.
[64] Learning multiple layers of representation.
[65] Speech enhancement based on deep denoising autoencoder.
[66] A regression approach to speech enhancement based on deep neural networks.
[67] Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups.
[68] S1 and S2 heart sound recognition using deep neural networks.
[69] Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition.
[70] Gradient-based learning applied to document recognition.
[71] Deep convolutional neural networks for LVCSR.
[72] Batch normalization: Accelerating deep network training by reducing internal covariate shift.
[73] 聲音卡卡? 搶救破囉嗓記好這4招保養你的喉嚨。
[74] Dr.Voice - 疾病介紹。
[75] Voice quality after treatment of early vocal cord cancer: a randomized trial comparing laser surgery with radiation therapy.
[76] Optimization of the Minimal Clinically Important Difference of the Mandarin Chinese Version of 10-Item Voice Handicap Index.
[77] Validity and reliability of the reflux symptom index (RSI).
[78] Confusion Matrix.
[79] Age determination of children in preschool and primary school age with gmm-based supervectors and support vector machines/regression.
[80] Evaluation and assessment of speech intelligibility on pathologic voices based upon acoustic speaker models.
[81] Automatic Detection of Pathological Voices Using GMM-SVM Method.
[82] Support vector machines using GMM supervectors for speaker verification.
[83] Maximum likelihood from incomplete data via the EM algorithm.
[84] Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains.
[85] A new signature verification technique based on a two-stage neural network classifier.
[86] Learning to use a learned model: A two-stage approach to classification.
[87] Image Blur Classification and Parameter Identification Using Two-stage Deep Belief Networks.
[88] Speech Command Recognition Using Deep Learning.
[89] 灰階影像。
[90] Choose Classifier Options.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top