研究生(外文):LIN, LING-JUNG
論文名稱:運用主訴進行語音及文字探勘評估病人之 憂鬱程度及自殺風險
論文名稱(外文):Using Chief Complaint for Speech Analytics and Text Mining to Assess Depression and Suicide Risk of Patients
外文關鍵詞:DepressionSuicide riskText miningSpeech analytics
本研究收集了200位個案的基本資料、主訴音檔及相關量表的結果,以漢密爾頓憂鬱量表 (Hamilton Depression Rating Scale [HAMD]) 及SAD PERSONS量表作為兩個依變數,透過情感分析及詞頻-逆向文件頻率 (Term Frequency-Inverse Document Frequency [TF-IDF]) 等方法產生文本特徵,再以Cool Edit工具獲取語音特徵,最終使用決策樹、K-最近鄰、神經網路、支持向量機、邏輯迴歸、隨機森林以及自適應增強來建立預測模型。

在預測憂鬱傾向的實驗中,以文本加語音及個人資料所建立的隨機森林模型最佳 (AUC = 0.92);而在自殺風險的預測中,以邏輯迴歸模型的效能最佳 (AUC = 0.939),更透過實驗證明了焦慮及睡眠問題會影響自殺風險的高低。而本研究亦證明在預測憂鬱程度及自殺風險時,比起只考慮單一類型的特徵,同時列入文本特徵及語音特徵將有助於提升模型效能。
In recent years, depression has become a common civilization disease. Patients with depression will show more symptoms than those without depression, and their quality of life will also be affected. The report of WHO shows that depression is one of the main predisposing factors for suicide, and severe depression may even die due to suicidal behavior. Therefore, if people who have high risk of suicide cannot be detected and treated in time, it may cause huge health losses and economic burdens. Therefore, the study uses chief complaint for text and speech analytics to build the model. Then, the depression scale and risk assessment scale of suicide be used to assess the severity of depression and suicide risk. It is hoped that the model can provide a reference suggestion in clinical for medical staff.

The study collected the 200 cases, the data include personal information, the chief complaint, and the results of related scales. The HAMD and the SAD PERSONS were used as two dependent variables, and sentiment analysis and TF-IDF were used to generate text features, and then use the Cool Edit tool to obtain speech features, and finally use Decision Trees, K-Nearest Neighbor, Neural Network, Support Vector Machine, Logistic Regression, Random Forests, and AdaBoost to build predictive models.

In the experiments to predict severity of depression, the random forest classifier is the best (AUC = 0.92). In the experiments to predict suicide risk, the logistic regression classifier has the best performance (AUC = 0.939), it has been proved that anxiety and sleep problems can affect the level of suicide risk. The study also proved that when predicting depression and suicide risk, instead of considering only a single categorical of features, including text features and speech features at the same time will help improve the model's performance.
第一章、 緒論
1.1 研究背景
1.2 研究動機
1.3 研究目的
第二章、 文獻探討
2.1 基於文本的憂鬱症研究
2.2 基於語音的憂鬱症研究
第三章、 研究方法
3.1 研究架構與流程
3.2 資料蒐集
3.3 變數定義
3.3.1 依變數定義
3.3.2 自變數定義
3.4 資料前處理
3.5 特徵值擷取
3.5.1 Term Frequency-Inverse Document Frequency
3.5.2 情感分析43
3.5.3 Cool Edit pro 2.1
3.6 資料探勘分析技術
3.6.1 決策樹
3.6.2 K-最近鄰演算法
3.6.3 類神經網路
3.6.4 支持向量機
3.6.5 邏輯迴歸
3.6.6 隨機森林
3.6.7 自適應增強
3.7 實驗設計
3.7 實驗驗證與評估指標
第四章、 實驗建構與評估
4.1 實驗資料
4.2 實驗結果與評估
4.2.1 實驗1
4.2.2 實驗2
4.2.3 實驗3
4.2.4 實驗4
4.2.5 實驗5
4.2.6 實驗6
4.2.7 實驗7
4.2.8 實驗8
4.3 討論
第五章、 研究結論與建議
5.1 研究結論
5.2 研究限制
5.3 未來研究方向

