跳到主要內容

臺灣博碩士論文加值系統

(44.200.140.218) 您好!臺灣時間:2024/07/18 03:48
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:吳建霖
研究生(外文):WU,CHIEN-LIN
論文名稱:建構嚴重型精神疾病的預測模型:機器學習和深度學習演算法的比較分析
論文名稱(外文):Building predictive models for severe mental illness: a comparative analysis of machine learning and deep learning algorithms
指導教授:楊孟翰楊孟翰引用關係
指導教授(外文):YANG,MENG-HAN
口試委員:楊孟翰陳俊豪鐘文鈺林威成
口試委員(外文):YANG,MENG-HANCHEN,CHUN-HAOCHUNG,WEN-YULIN,WEI-CHENG
口試日期:2022-06-28
學位類別:碩士
校院名稱:國立高雄科技大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2022
畢業學年度:110
語文別:中文
論文頁數:59
中文關鍵詞:多層感知器卷積神經網路循環神經網路長短期記憶門控循環單元決策樹隨機森林支持向量機MIMIC
外文關鍵詞:multilayer perceptron (MLP)convolutional neural network (CNN)recurrent neural network (RNN)long short-term memory (LSTM)gated recurrent unit (GRU)decision tree (DT)random forest (RF)support vector machine (SVM)MIMIC (medical information mart for intensive care)
相關次數:
  • 被引用被引用:1
  • 點閱點閱:333
  • 評分評分:
  • 下載下載:53
  • 收藏至我的研究室書目清單書目收藏:0
嚴重型精神疾病(severe mental illness)是一種嚴重型精神障礙,其中包含雙相情緒障礙症(bipolar disorder)、思覺失調症(schizophrenia) 、精神官能症(neurosis)、器質性精神病(organic psychosis),在此研究中會聚焦在雙相情緒障礙症和思覺失調症上,以作為實驗的數據;嚴重型精神疾病大概在人口中佔大約 0.3%至 0.7%的在一生中被診斷,並在 2019年計算,全球約有 2000 萬病例。在與一般人口相比,嚴重型精神疾病患有更多的身體相關疾病,導致在預期壽命平均減少約 20 年。
根據研究,嚴重型精神疾病患者似乎更容易罹患心、腦血管疾病,但是對於嚴重型精神疾病患者呈現腦血管病變的現象,在解釋生理、病理方面構成了一項臨床挑戰,其中在雙相情緒障礙症上,獲得此症狀的患者在躁狂期間,容易感覺異常精力充沛、快樂或易怒,並且經常做出衝動的決定,較少考慮後果,另外獲得思覺失調症的患者上,會較容易產生幻覺、妄想以及思維混亂,進而會產生社交退縮、情緒表達減少,而這兩者症狀也跟心、腦血管疾病上,似乎也是有稍加的影響。另外,在嚴重型精神疾病上,可能提升罹患心、腦血管疾病的風險,卻不見得受到足夠重視。為了在基層保健上普及關於嚴重型精神疾病、抗精神疾病藥物副作用和心、腦血管疾病的共病關聯知識,本論文希望透過多變因、量化評估風險的方式來建構模型,並應對這些挑戰。研究規劃將以公開分享的醫藥數據集重症醫學資料庫(medical information mart for intensive care, MIMIC)為資料來源,內含數千名內、外科重症監護病房和急診部病房之成人患者的去識別化電子病歷相關資料。
在本項研究工作中,我們使用word2vec針對MIMIC上的診斷代碼做具備語意的嵌入向量,其中在MIMIC中,使用ICD(international classification of disease)所定義的診斷代碼,版本同時有ICD-9(ninth revision)和ICD-10(tenth revision),做各自的詞嵌入資料,以方便做模型上預測的比較。
在預測模型上,將使用深度學習(MLP、CNN、RNN、LSTM、GRU)及機器學習(decision tree、random forest、SVM),分別預測MIMIC在ICD-9和ICD-10上的效果;在深度學習中,MLP的架構上會去建構其中的隱藏層數、隱藏層節點數、dropout的使用與否進行模型預測,在CNN的架構上,則建構卷積層、池化層來擷取資料的特徵,並保留重要數據內容,以取出資料的重點,在RNN當中,會把前一筆資料在隱藏層的計算結果,遞歸到下一筆資料在隱藏層的計算內容,而LSTM則是藉由增加閘門,決定遞歸而來的計算結果是否要捨棄,以達到長期短期記憶效果,GRU的部分則是將LSTM的結構和參數簡化,以提升其效率,減少時間成本;在機器學習中,decision tree會使用評估指標判別分枝好壞進行分類並預測其結果,在random forest上,則會輸入隨機樣本組成眾多決策樹,並以bagging的方式進行評斷分析決策樹的優劣,在SVM中,會以計算kernel的方式定義出margins,並區分數據的群體。綜上所述,本論文的演算方法設計在資料實驗組與對照組總筆數20000筆的狀態做訓練,使用深度學習類型演算法建構的預測模型其準確度大約70%,使用機器學習演算法建構的預測模型準確度大約65%,兩者在比較模型預測結果上大致相差5%,但在實驗組與對照組總筆數60000筆的狀態做訓練,使用深度學習類型演算法建構的預測模型其準確度大約90%,使用機器學習演算法建構的預測模型準確度大約83%,兩者在比較模型預測結果上大致相差7%,有明顯表示機器學習和深度學習在訓練上的效果有很大的差異性。
本論文從MIMIC中以有無嚴重型精神疾病區分實驗組及對照組,再將詞嵌入的模型將診斷代碼轉換成編碼向量,並賦予疾病向量意義,再透過使用ICD-9和ICD-10各自的詞嵌入資料放入到機器學習和深度學習中,使每個模型去預測實驗組和對照組,並且使用精確率與召回率(precision and recall)評估各個模型在訓練上各個準確度和召回率的效果,結果顯示,在各個模型的觀察中,可發現GRU在準確度與數種指標上,有著較優的表現結果,在指標上,也觀察到數據越大,機器學習和深度學習在評比上有明顯拉開的趨勢,甚至在各個模型上,準確度也有明顯的增加,以表示數據越大,模型上的預測也越具有參考性,並從各個模型上的實驗觀察中得知診斷代碼在模型預測上的可行性。

Severe mental illness is a serious mental disorder that includes bipolar disorder, schizophrenia, neurosis, and organic psychosis. In this study, the focus will be on bipolar disorder and schizophrenia as experimental data; severe mental illnesses account for approximately 0.3% to 0.7% of the population diagnosed over a lifetime, and in 2019, there will be approximately 20 million cases worldwide. In comparison to the general population, people with severe mental illness suffer more physically related illnesses, resulting in an average reduction in life expectancy of about 20 years.
According to research, people with severe mental illness appear to be more prone to cardiovascular and cerebrovascular disease, but the presence of cerebrovascular disease in people with severe mental illness poses a clinical challenge in terms of explaining the physiology and pathology of the condition. In addition, patients with psychosis are more prone to hallucinations, delusions and confused thinking, which can lead to social withdrawal and reduced emotional expression. In addition, the risk of developing cardiovascular and cerebrovascular diseases is increased by severe mental illness, but is not given sufficient attention. In order to disseminate knowledge on the co-morbid associations of severe mental illness, anti-psychotic side effects and cardiovascular and cerebrovascular diseases in primary care, this paper aims to address these challenges through a multivariate, quantitative risk assessment approach to modelling. The study is planned to use the publicly available medical information mart for intensive care (MIMIC) as a data source, containing de-identified electronic medical records of thousands of adult patients in medical and surgical intensive care units and emergency department wards.
In this study, we used word2vec to make semantic embedding vectors for the diagnosis codes on MIMIC, where the diagnosis codes defined by the ICD (international classification of disease) are used in MIMIC, in both ICD-9 (ninth revision) and ICD-10 (tenth revision) are used for the respective word embedding data to facilitate prediction comparison in the model.
In the prediction model, deep learning (MLP, CNN, RNN, LSTM, GRU) and machine learning (decision tree, random forest, SVM) will be used to predict the effect of MIMIC on ICD-9 and ICD-10 respectively; in deep learning, the MLP architecture will be used to construct the number of hidden layers, the number of hidden layers, the number of nodes, and the number of dropouts. In the CNN architecture, the convolutional layer and pooling layer are constructed to capture the features of the data and retain the important data content to extract the focus of the data. The GRU part simplifies the structure and parameters of the LSTM to improve its efficiency and reduce time costs. In SVM, margins are defined in terms of kernels and the clusters of data are differentiated. In summary, the algorithms in this paper were designed to train the experimental group with a total of 20,000 strokes and the prediction model constructed using a deep learning type of algorithm was approximately 70% accurate, while the prediction model constructed using a machine learning algorithm was approximately 65% accurate, with an approximate difference of 5% between the two models. However, in the experimental group and the control group with a total of 60,000 strokes, the accuracy of the prediction model constructed using the deep learning type of algorithm was about 90%, while the accuracy of the prediction model constructed using the machine learning algorithm was about 83%, with an approximate difference of 7% in the prediction results of the comparative model.
In this paper, the experimental and control groups are distinguished by the presence or absence of severe mental illness in MIMIC, and the word embedding model converts the diagnosis codes into coding vectors and gives meaning to the disease vectors. The results show that GRU has better performance results in terms of accuracy and several metrics for each model, and it is also observed that the larger the data is, the greater the trend of machine learning and deep learning in terms of the ratings, even for each model. There is also a significant increase in accuracy for each model, indicating that the larger the data, the more informative the predictions on the models are, and the feasibility of the diagnostic codes on the model predictions from the experimental observations on each model.

目錄
摘要 i
ABSTRACT iii
圖目錄 viii
表目錄 ix
一、緒論 1
1.1研究背景與動機 1
1.2分析流程概述 3
1.3 論文架構 4
二、文獻探討 5
2.1 嚴重型精神疾病 5
2.2機器學習 6
2.2.1損失函數與誤差反向傳播 6
2.2.2支持向量機 8
2.2.3 決策樹 9
2.3深度學習 11
2.3.1卷積神經網路 14
2.3.2詞嵌入 16
2.3.3循環神經網路 18
三、研究方法 23
3.1 分析資料來源 23
3.2資料前置處理 24
3.2.1 資料的挑選 24
3.3資料分析流程 26
3.4機器學習模型 27
3.4.1支持向量機模型 27
3.4.2 決策樹模型與隨機森林模型 27
3.5深度學習模型 29
3.5.1多層感知器模型 29
3.5.2 卷積神經網路模型 30
3.5.3卷積神經網路+多層感知器模型 31
3.5.4 word2vec針對ICD-9、ICD-10前置作業 32
3.5.5循環神經網路、長短期記憶與門控循環單元模型 33
四.實驗結果 36
4.1機器學習和深度學習分析指標 36
4.1.1實驗組10000筆對上對照組10000筆 38
4.1.2實驗組10000筆對上對照組30000筆 39
4.1.3實驗組10000筆對上對照組50000筆 40
4.2 ICD-9和ICD-10各自資料結構分析 41
4.2.1實驗組和對照組之間相關係數 41
4.2.2實驗組和對照組之間距離 42
4.2.3實驗組與對照組之分佈 42
五.結論 43
參考文獻 44
附錄一 48
附錄二 50
附錄三 55




圖目錄
圖1 SVM模型 8
圖2 decision tree模型 9
圖3 random forest模型 10
圖4 neural network神經元圖 12
圖5 neural network做反向傳播優化 12
圖6 neural network的網路架構 13
圖7 CNN網路架構圖 15
圖8 skip-gram編碼示意圖 16
圖9 skip-gram網路示意圖 17
圖10 RNN架構圖 18
圖11 LSTM架構圖 20
圖12 LSTM概念圖 20
圖13 GRU概念圖 21
圖14 GRU架構圖 22
圖15資料前處理流程圖 25
圖16資料分析流程圖 26
圖17 CNN+ MLP網路架構圖 32
圖18實驗組與對照組分佈圖 42




表目錄
表1 SVM參數表 27
表2 decision tree參數表 27
表3 random forest參數表 28
表4 MLP參數表 29
表5 CNN參數表 30
表6 CNN+ MLP參數表 31
表7 word2vec參數表 32
表8 RNN參數表 33
表9 LSTM參數表 34
表10 GRU參數表 35
表11 ICD-9在10000:10000筆資料各項指標 38
表12 ICD-10在10000:10000筆資料各項指標 38
表13 ICD-9在10000:30000筆資料各項指標 39
表14 ICD-10在10000:30000筆資料各項指標 39
表15 ICD-9在10000:50000筆資料各項指標 40
表16 ICD-10在10000:50000筆資料各項指標 40
表17 ICD-9在各資料量比例中的相關係數 41
表18 ICD-10在各資料量比例中的相關係數 41
表19 ICD-9在各資料量比例中的距離 42
表20 ICD-10在各資料量比例中的距離 42


[1]Poh, M. Z., Swenson, N. C., & Picard, R. W. (2010). A wearable sensor for unobtrusive, long-term assessment of electrodermal activity. IEEE transactions on Biomedical engineering, 57(5), 1243-1252.
[2]Gunter, T. D., & Terry, N. P. (2005). The emergence of national electronic health record architectures in the United States and Australia: models, costs, and questions. Journal of medical Internet research, 7(1), e383
[3]Trends, T. M. (2014). Mobile Tech Contributions To Healthcare & Patient Experiences. Online: https://web.archive.org/web/20140530024928/http://topmobiletrends. com/mobile-technology-contributions-patient-experience-parmar.
[4]Kimble, C. (2014). Electronic health records: Cure‐all or chronic condition?. Global Business and Organizational Excellence, 33(4), 63-74.
[5]M. R. Cowie et al., "Electronic health records to facilitate clinical research," vol. 106, no. 1, pp. 1-9, 2017.
[6]Office of the National Coordinator for Health Information Technology. (2019). What are the advantages of electronic health records?.
[7]Chang, L. S., & Young, S. T. (1988). Transurethral Prostatectomy with Computer‐monitored Resectoscope. British journal of urology, 62(1), 54-58.
[8]唐高駿、郭博昭、何依婷,2020,Resuscitation & Intensive Care Med,中華民國重症醫學會,台北。
[9]Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2020). MIMIC-IV (version 0.4). PhysioNet. https://doi.org/10.13026/a3wn-hq05.
[10]A. L. Goldberger et al., "PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals," vol. 101, no. 23, pp. e215-e220, 2000.
[11]Kalyan, K. S., & Sangeetha, S. (2020). SECNLP: A survey of embeddings in clinical natural language processing. Journal of biomedical informatics, 101, 103323.
[12]Flamholz, Z. N., Crane-Droesch, A., Ungar, L. H., & Weissman, G. E. (2022). Word embeddings trained on published case reports are lightweight, effective for clinical tasks, and free of protected health information. Journal of Biomedical Informatics, 125, 103971.
[13]J. Tomasik et al., "A machine learning algorithm to differentiate bipolar disorder from major depressive disorder using an online mental health questionnaire and blood biomarker data," vol. 11, no. 1, pp. 1-12, 2021.
[14]Priya, A., Garg, S., & Tigga, N. P. (2020). Predicting anxiety, depression and stress in modern life using machine learning algorithms. Procedia Computer Science, 167, 1258-1267.
[15]B. K. Beaulieu-Jones, P. Orzechowski, and J. H. Moore, "Mapping patient trajectories using longitudinal extraction and deep learning in the MIMIC-III critical care database," in PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018: Proceedings of the Pacific Symposium, 2018, pp. 123-132: World Scientific.
[16]Arciniegas, D. B. (2015). Psychosis. Continuum: Lifelong Learning in Neurology, 21 (3 Behavioral Neurology and Neuropsychiatry), 715–736.
[17]J. Radua et al., "What causes psychosis? An umbrella review of risk and protective factors," vol. 17, no. 1, pp. 49-66, 2018.
[18]Stahl, S. M. (2018). Beyond the dopamine hypothesis of schizophrenia to three neural networks of psychosis: dopamine, serotonin, and glutamate. CNS spectrums, 23(3), 187-191.
[19]Griswold, K. S., Del Regno, P. A., & Berger, R. C. (2015). Recognition and differential diagnosis of psychosis in primary care. American family physician, 91(12), 856-863.
[20]R. N. Cardinal and E. T. Bullmore, "The diagnosis of psychosis," 2011.
[21]M. E. Thakur, The American psychiatric publishing textbook of geriatric psychiatry. American Psychiatric Pub, 2015.
[22]S. L. James et al., "Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017," vol. 392, no. 10159, pp. 1789-1858, 2018.
[23]Saha, S., Chant, D., Welham, J., & McGrath, J. (2005). A systematic review of the prevalence of schizophrenia. PLoS medicine, 2(5), e141.
[24]M. Owen, A. Sawa, and P. Mortensen, "Schizophrenia. Lancet Lond. Engl. 388, 86–97," ed, 2016.
[25]Van de Leemput, J., Hess, J. L., Glatt, S. J., & Tsuang, M. T. (2016). Genetics of schizophrenia: historical insights and prevailing evidence. Advances in genetics, 96, 99-141.
[26]J. van Os and S. Kapur, "Schizophrenia. Lancet Lond. Engl. 374, 635–645," ed, 2009.
[27]Laursen, T. M., Nordentoft, M., & Mortensen, P. B. (2014). Excess early mortality in schizophrenia. Annu Rev Clin Psychol, 10(1), 425-448.
[28]López, V. B. (2019). Beyond Medical Legal Partnerships: Addressing Recovery-Harming Social Conditions Through Clubhouse-Legal Partnerships. NYU Rev. L. & Soc. Change, 43, 429.
[29]"Psychosis". NHS. 23 December 2016. Archived from the original on 15 October 2018. Retrieved 24 January 2018.
[30]Haddad, P. M., & Correll, C. U. (2018). The acute efficacy of antipsychotics in schizophrenia: a review of recent meta-analyses. Therapeutic advances in psychopharmacology, 8(11), 303-318.
[31]S. Leucht et al., "Sixty years of placebo-controlled antipsychotic drug trials in acute schizophrenia: systematic review, Bayesian meta-analysis, and meta-regression of efficacy predictors," vol. 174, no. 10, pp. 927-942, 2017.
[32]R. Gibbs, B. Karlan, A. Haney, and I. Nygaard, "Danforth's Obstetrics and Gynecology. Lippincott Williams & Wilkins Press," ed: Philadelphia, 2008.
[33]J. F. Giddens, Concepts for Nursing Practice E-Book. Elsevier Health Sciences, 2019.
[34]K. Huang et al., "TDC: machine learning and biomedicine {data set and LB}," 2020.
[35]Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
[36]Kamiński, B., Jakubczyk, M., & Szufel, P. (2018). A framework for sensitivity analysis of decision trees. Central European journal of operations research, 26(1), 135-159.
[37]Ho, T. K. (1995, August). Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition (Vol. 1, pp. 278-282). IEEE.
[38]Huang, L., Liu, T., Yang, X., Luo, Y., Rivenson, Y., & Ozcan, A. (2021). Holographic image reconstruction with phase recovery and autofocusing using recurrent neural networks. ACS Photonics, 8(6), 1763-1774.
[39]Fitch, F. B. (1944). Warren S. McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of mathematical biophysics, vol. 5 (1943), pp. 115–133. The Journal of Symbolic Logic, 9(2), 49-50.
[40]Mahapattanakul, Puttatida (November 11, 2019). "From Human Vision to Computer Vision — Convolutional Neural Network(Part3/4)"
[41]Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[42]Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.
[43]Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 2554-2558.
[44]Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
[45]Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
[46]A. E. Johnson et al., "MIMIC-III, a freely accessible critical care database," vol. 3, no. 1, pp. 1-9, 2016.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊