|
With more and more importance for personal healthcare services, the researches have been increased gradually, which combine many producible physiological signals in human body with machine learning, for example, ECG, breathing, and voice signals. One of these reasearches is the detection of pathological voice to analyze for this study. Detection of pathological voice is a system, proposing warning when there is prominence in the throat or isn't capable of speaking for a long time, and then analyze the possible disease for the patients. This system is attracted interest from many countries, and further incorporated into one technology of smart healthcare systems. The paper proposes a deep-learning-used method to classify three common pathlogical voice. Tthe neuron of deep neural network (DNN) can make use of multilayer and have the characteristic with optimized weights, so DNN is able to construction efficiently acoustic model with nonlinearity.
The database used in this study was set up from the department of Otolaryngology Head and Neck Surgery, Far Eastern Memorial Hospital (FEMH). This study applies 589 samples for classification of pathological voice, including three common categroies, glottic neoplasm, phonotraumatic lesions, and vocal paralysis. The results based on three features based on cepstral acoustics were validated. The highest accuarcy was 76.94%,and UAR (unweighted average recall) was 64.25% for acoustic signals. Besides, we used medical record to validate. The highest accuarcy was 81.56%,and UAR was 73.65% for medical record. The experimental results used on this database show the feasibility of architecture of classification system in this paper. Comparing the experimental results with many traditional machine learning algorithms, the results only confirm DNN improve the four commonly-used classifiers, namely, Gaussian Mixture models (GMMs), support vector machine (SVM), decision tree (DT), and k-nearest neighbor (KNN). However, it needs to improve performance with combining voice and medical record.
To utilize the advantage of two features both acoustic signals and medical record, this paper proposes two deep-learning-based methods to improve the performance of classification, called supervector and fusion learning. The former transforms the dynamic acoustic waves into a static supervector via Gaussian mixture models that can be easily combined with the medical record. The latter embeds a two-stage deep neural network and iteratively refines the fusion process according to the local estimation from the first stage and the original signal statistics. Experiments clearly demonstrated that the proposed fusion approaches outperform any individual information source. The proposed supervector and fusion learning algorithms improves the accuracy and UAR by 2.02 - 10.32% and 2.48 - 17.31%, respectively, compared to acoustic signals and medical record.
Besides, the study uses the pictures as the input feature with the acoustic signals and medical history, and replaces DNN with Convolutional Neural Network (CNN) for classification. This study is different from the past studies using acoustic signals for classification, get accuracy 80.52% and UAR 74.84% for spectrum, amd accuracy 60.88% and UAR 41.81% for medical record image. Then, the paper only use feature-based and model-based combination to improve the original feature. The former explains that the history repeat the number of vocal frame, and concatenate the feature to train a model. The latter is to combine outputs of the model with certain weight for two features, and sum the scores to diagnoise the diseases. Experiments demonstrated that the feature-based combination compares with original features, it can improve the accuracy and UAR by 1.38 - 21.02% and 2.33 - 35.36% with CNN.
|