(18.204.227.34) 您好!臺灣時間:2021/05/17 06:24
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:游雅雯
研究生(外文):Ya-wen Yu
論文名稱:建構大腸鏡異常發現之預測模式
論文名稱(外文):Constructing the Predictive Model of Colonoscopy’s Abnormal Findings
指導教授:鄭博文鄭博文引用關係
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:工業工程與管理研究所碩士班
學門:工程學門
學類:工業工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:75
中文關鍵詞:邏輯斯迴歸模型大腸鏡支援向量機Random Forest類神經網路決策樹J48貝氏分類器
外文關鍵詞:ColonoscopyLogistic regressionANNRandom ForestDecision tree J48Naive Bayes classifierSVM
相關次數:
  • 被引用被引用:9
  • 點閱點閱:446
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來,惡性腫瘤在台灣一直為十大死因之首,2006年大腸直腸癌已經躍升國內癌症發生率第二。根據行政院衛生署民國1997年至2007年大腸直腸癌死因統計,1997年約有2,855人罹患大腸直腸癌(死亡率為每十萬人十四人),2007年約有4,470人死於大腸直腸癌(死亡率為每十萬人十六人),比較這十年內的死亡率人數,其呈現逐年快速上升趨勢。
本研究運用台灣某個案醫院健康檢查中心之資料庫。本研究利用六種資料探勘方法:邏輯斯迴歸模型、決策樹J48、類神經網路、Random Forest、支援向量機、貝氏分類器,建構大腸鏡預測模型,比較此六種方法的預測能力,探勘大腸異常相關之健檢因子。
結果顯示,以Random Forest的預測模式績效最好,其準確度(Accuracy)為70.90 %,作業特徵曲線(ROC)為0.72,精確度(Precision)為0.75、回覆率(Recall)為0.86。影響因子為年齡、性別、身體質量指數、三酸甘油脂、癌胎抗原指數、空腹血糖、舒張壓。本研究結果可提供醫院、民眾作為健檢項目的評估,以期早期診斷、早期治療。
In 2006, Colorectal Cancer’s occurrence rate had already risen to the second place in Taiwan. Based on the statistics of the Department of Health nearly 2,855 people suffered from the colorectal cancer in 1997 and nearly 4,470 in 2007. Comparing the number of people, the death rate for colorectal cancer in the last ten years has risen year by year.
This data of the study is adopted from physical examination center located in a Taiwan medical center. This study employed six data mining methods, including:Logistic regression, Decision tree J48, and ANN, Random Forest, SVM, and Naive Bayes classifier. This study compares these six performances of the model of Colonoscopy by applying the six data mining methods.
This study found that Random Forest outperform the other five methods with accuracy 70.90 %, ROC 0.72, Precision 0.75, and Recall 0.86. The influential factors are age, gender, BMI, TG, CEA, Fasting Glucose, and DIA. The result can offer hospitals to evaluate a patient’s Colorectal Cancer in early days.
中文摘要 ----------------------------------------------------------------------------- i
英文摘要 ----------------------------------------------------------------------------- ii
誌謝 ----------------------------------------------------------------------------- iii
目錄 ----------------------------------------------------------------------------- iv
表目錄 ----------------------------------------------------------------------------- vi
圖目錄 ----------------------------------------------------------------------------- vii
一、 緒論----------------------------------------------------------------------- 1
1.1 研究背景與動機-------------------------------------------------------- 1
1.2 研究目的----------------------------------------------------------------- 3
1.3 研究範圍與限制-------------------------------------------------------- 3
1.4 研究問題----------------------------------------------------------------- 3
1.5 研究流程----------------------------------------------------------------- 4
二、 文獻探討----------------------------------------------------------------- 5
2.1 大腸癌(Colon Cancer)簡介------------------------------------------- 5
2.1.1 大腸癌危險因子-------------------------------------------------------- 7
2.1.2 病人之篩檢、評估及診斷-------------------------------------------- 8
2.1.3 大腸癌分期-------------------------------------------------------------- 11
2.1.4 大腸瘜肉與大腸癌----------------------------------------------------- 12
2.2 資料探勘----------------------------------------------------------------- 13
2.2.1 資料庫知識探勘(Knowledge Discovery in Database, KDD) --- 13
2.2.2 資料探勘功能(Functionality of Data Mining) --------------------- 15
2.3 分類(Classification)的定義------------------------------------------- 16
2.3.1 Logistic迴歸(logistic regression, LR) ------------------------------- 17
2.3.2 決策樹-------------------------------------------------------------------- 19
2.3.3 類神經網路(Artificial Neural Network) ---------------------------- 22
2.3.4 Random Forest---------------------------------------------------------- 26
2.3.5 SVM---------------------------------------------------------------------- 27
2.3.6 Naive Bayes-------------------------------------------------------------- 29
2.3.7 六種演算法優缺點----------------------------------------------------- 31
2.4 資料探勘於醫學上之研究-------------------------------------------- 32
2.5 分類方法於過去研究比較-------------------------------------------- 33
三、 研究方法----------------------------------------------------------------- 34
3.1 研究步驟----------------------------------------------------------------- 34
3.2 研究架構 --------------------------------------------------------------- 35
3.3 研究對象及範圍-------------------------------------------------------- 36
3.4 資料分類與前處理----------------------------------------------------- 37
3.5 cross-validation---------------------------------------------------------- 41
3.6 Logistic迴歸模型------------------------------------------------------- 41
3.7 決策樹J48演算法------------------------------------------------------ 41
3.8 神經網路演算法-------------------------------------------------------- 42
3.9 Random Forest演算法------------------------------------------------- 42
3.10 SVM演算法------------------------------------------------------------- 43
3.11 Bayes – Naïve Bayes演算法------------------------------------------ 43
3.12 預測模式的績效評估-------------------------------------------------- 43
四、 結果分析----------------------------------------------------------------- 46
4.1 基本資料----------------------------------------------------------------- 46
4.2 建立模型----------------------------------------------------------------- 47
4.3 預測方法比較----------------------------------------------------------- 59
五、 結論與建議-------------------------------------------------------------- 60
5.1 結論與討論-------------------------------------------------------------- 60
5.2 建議----------------------------------------------------------------------- 61
參考文獻 ----------------------------------------------------------------------------- 62
中文文獻
1.丁一賢,陳牧言,2005,資料探勘,滄海書局出版,台中市,中華民國。
2.中華民國大腸直腸外科醫學會,2005,衛教文章-大腸直腸癌,網址:http://www.crs.org.tw/。
3.王煥昇,2008,大腸直腸癌,網址:http://homepage.vghtpe.gov.tw/~crs/disease/colcancer2.htm。
4.王濟川,郭志剛,2003,Logistic迴歸模型-方法及應用,五南圖書出版公司,台北市,中華民國。
5.王秀媛,2006,衛教園地-正視健康殺手-代謝症候群, 台電月刊,527期。
6.台灣癌症臨床研究發展基金會,2008,大腸直腸癌,網址:http://cisc.twbbs.org/lifetype/index.php?op=Default&postCategoryId=15&blogId=1。
7.行政院衛生署,86~96年歷年死因統計,網址:http://www.doh.gov.tw/CHT2006/DM/DM2_2.aspx?now_fod_list_no=10326&class_no=440&level_no=3。
8.李秀琴,2003,應用人工智慧技術於人類慢性疾病管理,彰化師範大學,資訊管理學系碩士班論文。
9.吳國楨,2000,資料探索在醫學資料庫之應用研究,中原大學,醫學工程學系碩士論文。
10.洪淑芬、黃朝玲、董信煌,2006,文件探勘技術於生物醫學文件分類之應用,第十一屆資訊管理暨實務研討會。
11.姚志成,2005,運用資料探勘技術建構脂肪肝預測模式,中原大學,資訊管理研究所碩士論文。
12.陶聲洋防癌基金會,2008,網址:http://www.sydao.org.tw/cure/cure_main.html。
13.社會保險健康事業財團,2008,網址:http://www.peare.or.jp/peare/frame/a_2.html。
14.高血壓在台灣人族群的危險因子,2008,網址:http://www.ibms.sinica.edu.tw/~pan/young%20hypertension/risk%20factors.htm。
15.陳怡婷,2005,建構外科加護病房急性血液透析病患之預後模式,國立雲林科技大學,工業工程與管理研究所碩士論文。
16.梁玉芬,2007,運用資料探勘技術建構骨骼疏鬆預測模式,國立雲林科技大學,工業工程與管理研究所碩士論文。
17.張語恬、朱基銘、簡戊鑑、周雨青、楊燦、盧瑜芬、白健佑、白璐、Thomas Wetter,2007,比較三種資料探勘演算法預測子宮頸癌五年存活的外部通用性效能,台灣家醫誌,17卷4期。
18.曾明性、張彙音、黃怡嘉、李博仁、蔡鎮雄,2006,決策樹與羅吉斯迴歸技術於冠狀動脈疾病診斷之探討,第十一屆人工智慧與應用研討會。
19.曾憲雄、蔡秀滿、蘇東興、曾秋蓉、王慶堯,2005,資料探勘,旗標出版,台北市,中華民國。
20.蔡蕙如、柯明中、張偉斌、劉德明,2007,應用類神經網路與迴歸樹進行肝癌的分類模式,北市醫學雜誌,4(8):658-667。
21.陳維熊、魏承生,2008,中西醫會診-大腸癌,書泉出版,台灣,中華民國。
22.葉怡成,2000,類神經網路模式應用與實作,儒林圖書,台北。
23.鄭為倫,2005,單分類器在文件多類別分類上之研究,銘傳大學資訊管理學系。
24.劉易承、宋鴻樟、謝玲玲,2007,大腸直腸癌之風險預測模式與風險指標,台灣衛誌,Vol. 27. No. 1。
25.癌症防治中心,2008,網址:http://www.chimei.org.tw/top/top03/9014/old/9017/9017_3_3.htm。
26.饒樹文,2007,大腸直腸癌之篩檢及治療,聲洋防癌之聲秋季號。
27.Therapy Guideline of Colorectal Cancer, 2008, 網址:http://www.nhri.org.tw/jchen/faculty/cccolon.php3。
英文文獻
1.Anita Prinzie, Dirk Van den Poel, 2008, Random Forests for multiclass classification: Random Multinomial Logist, Expert Systems with Applications , Vol.34(3), 1721-1732.
2.Barry, M.J., Mulley, A.G. and Richter, J.M. 1987, "Effect of workup strategy on costeffectiveness of fecal occult blood screening for colorectal cancer", Gastroenterology, 93, 301-310.
3.Berry, M. J. A. and G. Linoff , 1997, Data Mining Techniques For Marketing, Sales, And Customer Support, JohnWiley & Sons, Inc.
4.Carlos A. Rubio, 2002, “Colorectal Adenomas: Time for Reappraisal”, Pathol. Res. Pract. 198: 615–620.
5.Chang M-H, Hahn RA, Teutsch SM, Hutwagner LC. 2001, Multiple risk factors and population attributable risk for ischemic heart disease mortality in the United States, 1971–1992. J Clin Epidemiol;54:634–44.
6.Charles J. Kahi, Douglask. Rex, and Thpmasf. Imperiale, 2008, Reviews In Basic And Clinical Gastroenterology, Gastroenterology 2008; 135:380–399.
7.Chiu H-M, Lin J-T, Shun C-T, et al. 2007, Association of metabolic syndrome with proximal and synchronous colorectal neoplasm. Clin Gastroenterol Hepatol; 5:221–229.
8.Codori AM et. al. 2001, Health Beliefs and Endoscopic Screening for Colorectal Cancer: Potential for Cancer Prevention. Preventive Medicine. 33:128-136
9.Delen, D., Walker, G. & Kadam, A., 2004, Predicting breast cancer survivability: A comparison of three data mining methods.
10.Dunham M, H., 2002, Data Mining Introductory and Advanced Topics, Pearson Education, New Jersey.
11.Dursun, S., Haddad, P. M. & Barnes R. E. T. 2004, Extrapyramidal syndromes. In Adverse Syndromes and Psychiatric Drugs (eds P. M. Haddad, S. Dursun, & B. Deakin), pp. 1-20. Oxford University Press.
12.Frohlich, Edward D., 1993, Uric acid - a risk factor for coronary heart disease. Journal of the American Medical Association, Vol. 270, No. 3, July 21, pp. 378-79.
13.Giardiello, F. M., Casero, R. A., Hamiton, S. R., Hylino, L. M., Trimbath, J. D., & Geiman, D. E., et al. 2004, Prostanoids, ornithine decarboxylase, and polyamines in primary chemoprevention of familial adenomatous polyposis, Gastroenterology, 126(2), 425-431.
14.George H. John, 1995, “Pat Langley: Estimating Continuous Distributions in Bayesian Classifiers”, Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, pp.338-345.
15.Gilles, C., Hilariob. M., and Saxc. H. 2004, An Application of One-class Support Vector Machines to Nosocomial Infection Detection .HTMedinfoTH, 11(1), pp 716-20。
16.Han, J., Kamber, M., 2001, Data Mining: Concepts and Techniques, Academic Press, San Francisco.
17.Horfstad Horfstad B. In: Waye JD, Rex DK, Williams CB editor(s). 2003, Colonoscopy principles and practice. 1st Edition. Oxford: Blackwell, 2003:358–376.
18.Kim JH, Lim YJ, Kim Y-H, et al. 2007, Is metabolic syndrome a risk factor for colorectal adenoma? Cancer Epidemiol Biomarkers Prev; 16:1563–1566.
19.Liou, et al, 2007, Screening for Colorectal Cancer in Average-Risk Chinese Population Using a Mixed Strategy with Sigmoidoscopy and Colonoscopy, Dis Colon Rectum, Vol. 50, No. 5.
20.Luo, T., Kramer, K., Goldgof, D. B., Hall, L. O., Samson, S., Remsen, A., et al. 2004, Recognizing plankton images from the shadow image particle profiling evaluation recorder. IEEE Transactions on Systems Man and Cybernetics Part B—Cybernetics, 34(4), 1753–1762.
21.Martinez ME, Sampliner R, Marshall JR, Bhattacharyya AK, Reid ME, Alberts DS 2001, Adenoma characteristics as risk factors for recurrence of advanced adenomas. Gastroenterology 120: 1077–1083.
22.Mehmed Kantardzic, 2003, Data Mining: Concepts, Models, Methods, and Algorithms, Wiley-Interscience.
23.Pang-Ning Tan, Michael Steinback, and Vipin Kumar, 2006, Introduction to Data Mining, Addison Wesley.
24.Peters, J., De Baets, B., Verhoest, N.E.C., Samson, R., Degroeve, S., De Becker, P., Huybrechts, W., 2007. Random forests as a tool for ecohydrological distribution modelling. Ecol. Model. 207, 304–318.
25.Ramirez, J.C.G., Cook, D.J., Peterson, L.L., Peterson, D.M, 2000, “Temporal Pattern Discovery In Course-Of-Disease Data”, IEEE Engineering In Medicine And Biology Magazine, Vol. 19, Issue. 4, pp. 63-71.
26.Sturmer T, Glynn RJ, Lee IM, et al. 2000, Lifetime cigarette smoking and colorectal cancer incidence in the Physicians’ Health Study I. J Natl Cancer Inst; 92:1178–1181
27.Whittemore AS, Wu-Williams AH, Lee M, et al. 1990, Diet, physical activity, and colorectal cancer among Chinese in North America and China. J Natl Cancer Inst; 82:915- 26.
28.Venesio, T., Molatore, S., Cattaneo, F., Arrigoni. A., Risio, M.,& Ranzani, G. N. 2004, High frequency of MYH gene mutations in a subset of patients with familial adenomatouspoly-posis. Gastroenterology, 126(7), 1681-1685.
29.Verla-Tebit E, Carmen L, Hoffmeister M, et al. 2006, Cigarette smoking and colorectal cancer risk in Germany: a population-based casecontrol study. Int J Cancer; 119:630–635.
30.V. N. Vapnik, 1995, The Nature of Statistical Learning Theory, Springer-Verlag, NY, USA.
31.Ya Zhang, C. H. Chu, Y. Chen, H. Zha, and X. Ji, Splice Site Prediction Using Support Vector Machines with a Bayes Kernel, Expert Systems with Applications: An International Journal, (special issue on Intelligent Bioinformatics Systems), 30(1):73-81, 2006.
32.Y.H. Qiao, J.L. Liu, C.G. Zhang, X.H. Xu, Y.J. Zeng, 2005, “SVM classification of human intergenic and gene sequences”, Mathematical Biosciences 195, 168–178.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top