(3.238.186.43) 您好!臺灣時間:2021/03/01 09:56
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳啟煌
研究生(外文):Chi-Huang Chen
論文名稱:大型醫療文件資訊探勘
論文名稱(外文):Large Scale Data Mining for Healthcare Documents
指導教授:賴飛羆賴飛羆引用關係
指導教授(外文):Fei-Pei Lai
學位類別:博士
校院名稱:國立臺灣大學
系所名稱:電機工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:英文
論文頁數:90
中文關鍵詞:醫療文件探勘語意相似度量測知識探勘及管理出院病摘系統醫療資訊探勘
外文關鍵詞:Healthcare Documents MiningSemantic Similarity MeasureKnowledge Discovery and ManagementDischarge Summary SystemHealthcare Data Mining
相關次數:
  • 被引用被引用:2
  • 點閱點閱:336
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
隨著醫療資訊系統普及運用,許多電子化紀錄儲存在資料庫內,這一些紀錄包含放射線診斷報告、病理報告、手術報告、入院紀錄、出院病摘以及其他醫療記錄。隨著醫療院所長期營運,可收集為數不少的醫療記錄。這些資料背後隱含醫師診斷治療的經驗以及專業知識,值得運用資訊探勘技術來探勘這一些醫療文件,萃取其中之醫療知識。
為了從醫療文件取出完整的資訊,在本論文分別提出一個以語意基礎關鍵字比對萃取器、台大醫院完整出院病摘系統實做,內含自動完成、範本、詞庫等功能、以及一套適用於生物醫療領域語意相似度量測的方法。
論文中之語意驅動關鍵字比對萃取器可幫助醫療人員從醫療文件萃取出相關資訊並轉存入預先定義好的資料欄位內。在此系統這些文件中的資料可以透過半自動的比對、驗證、萃取轉存到結構化的資料庫內;此外此萃取器及預定範本資料庫的設計可增加大型醫療文件資訊探勘的擴充性。預定範本資料庫可以支援不同性質的醫療研究用途來萃取相關資訊。
網頁式出院病摘系統改善了以往主從式架構的舊系統需要到使用者端安裝及更新程式的缺點。此系統更引進了許多應用在網頁架構系統的新技術,包括AJAX 控制元件、SRILM語言模型、inverted table 反查機制,這一些機制可加快出院病摘的寫作。新的病摘系統確實能減少維護成本及提高病摘寫作效率。
藉由使用Google 搜尋引擎之網頁計數(page count), 本論文成功設計出一個以語料庫文基礎之語意相似度量測的方法。給定兩個詞P和Q 我們定義了不同的相似度分數,藉由搜尋引擎回傳P, Q 及”P and Q”之網頁計數,及一些詞彙樣式(pattern) 來計算這兩個詞的相似度分數。再將這一些分數利用support vector machine 來整合計算出兩個詞的語意相似度。實驗結果數據顯示我們的方法可以在由A. Hliaoutakis提出的資料集達到相關系數0.798. 在T. Pedersen提出的資料集,雖針對診斷編碼專家的相關系數只達0.496,但針對醫師的部分相關系數可達0.705. 顯然我們的方法比較接近醫師評斷相似度的方式。
結果顯示,本論文提出一套機制可實際應用在大型醫療文件資料探勘上。


With the popularity of computerized physician order usages, many electronic medical records are accumulated in the clinical database. The records contain numerous documents involving radiology or pathology reports, operation, admission notes, discharge summaries as well as other healthcare documents. After operating a long period, hospitals can collect a great amount, large scale of medical data. In the data, they encompass physicians'' treatment experience and expertise those can be extracted, learned, and obtained. Therefore, it is essential to mining the knowledge from the healthcare documents.
In this study, a semantic-driven keyword matching extractor is explored; a discharge summary system has been designed and developed with auto-complete, model essay and user defined phrase functions. To obtain the fully relevant information from the medical documents, a semantic similarity measure is introduced as well.
The semantic-driven extractor is adopted for guiding clinicians to extract the appropriate data from textual clinical documents into the case-oriented templates. In the developed system, the matched information in the documents can be structuralized through matching, verifying, and extracting semi-automatically. In addition, the design of information matching modules and the case-oriented templates increase the scalability of the system for involving large scale of healthcare documents mining. The case-oriented templates can support the capability of collecting corresponding extracted data for various medical researches as well.
The web-based discharge summary system eliminates defects of client-server architecture by avoiding installation or upgrading individually at each client side. Moreover, it introduces new technologies specialized to be applied under the web-based architecture, including AJAX Control toolkits, SRILM and inverted table. The features can expedite editing discharge summaries. The new system indeed reduces costs and increases productivities.
The page-count, corpus-based semantic similarity measure has been exploited via Google Search Engine. We define various similarity scores for two given terms P and Q, using the page counts for querying P, Q and P AND Q. Moreover, we computed semantic similarity using lexico-syntactic patterns with page counts. These different similarity scores are integrated adapting support vector machines, to leverage the robustness of semantic similarity measures. Experimental results on two datasets achieve correlation coefficients of 0.798 on the dataset provided by A. Hliaoutakis, 0.705 on the dataset provide by T. Pedersen with physician scores and 0.496 on the dataset provided by T. Pedersen et al. with expert scores. Apparently, the study produces a much closer correlation with physicians'' scores than with those of the experts''.
As the results of the research, we provide a large scale healthcare documents mining mechanism.


口試委員會審定書 i
致謝 ii
摘要 iii
Abstract v
Content vii
List of Figures x
List of Tables xi
Chapter 1 Introduction 1
1.1. Motivation and Objectives 1
1.2. Background 3
Chapter 2 Related Works 4
2.1. Semantic Similarity Measure 4
2.2. Semantic-driven Keyword Matching Extractor 5
2.3. Discharge Summaries 6
2.4. Healthcare Data Mining 7
Chapter 3 Semantic Similarity Measure 10
3.1. Introduction 10
3.2. Background 11
3.2.1. Google AJAX Search API 11
3.2.2. Support Vector Machine 12
3.3. Design and Methodology 14
3.3.1. Data Preparation and Collection 14
3.3.2. Feature Definitions 16
3.3.3. Feature Selection Strategy 17
3.3.4. Machine Learning Algorithms 18
3.4. Experiments 19
3.4.1. Experiment Environment 19
3.4.2. Datasets 19
3.4.3. Optimization and calibration 22
3.5. Results 23
3.6. Discussion 27
Chapter 4 Semantic-driven Keyword Matching Extractor 29
4.1. Introduction 29
4.2. Methodology 31
4.2.1. Overall Process of the Data Extraction 31
4.2.2. Semantic-driven Keyword Matching Methodology 33
4.2.3. Case-oriented Template 34
4.3. Achievements 34
4.3.1. Textual Documents Viewer 34
4.3.2. Semantic-driven Keyword Matching Modules 35
4.3.3. Extraction Verification Editor 37
4.4. Discussion 38
Chapter 5 Web-based Discharge Summary System 40
5.1. Introduction 40
5.1.1. Motivation and Objective 40
5.2. Backgrounds 41
5.2.1. NTUH Health Information System 41
5.2.2. AJAX Control Toolkit 43
5.2.3. SRI Language Modeling Toolkit (SRILM) 46
5.2.4. Inverted Index 46
5.3. System Design and Implementation 47
5.3.1. Architecture Overview 47
5.3.2. Features and Functionalities 48
5.3.3. Flow Chart 53
5.3.4. Implementation and Scenario 55
5.4. Achievements and Discussions 56
5.4.1. Comparison of WebDSS and Dis32 57
5.4.2. Performance of Inverted Table in Retrieval 57
5.4.3. Preliminary System Assessments 59
Chapter 6 Data Mining Project with Mongolia 61
6.1. Introduction 61
6.2. Methods 62
6.2.1. Research Principles and Methods 62
6.2.2. Data 65
6.3. Achievements 65
6.3.1. Data Mining Framework System Architecture 65
6.3.2. Clinical Data Pre-process Module 66
6.3.3. Data Mining Models 68
6.3.4. Open Standard of Clinical Pathway and Guideline 69
6.3.5. Server Virtualization and Cloud Computing 70
6.4. Discussions 71
Chapter 7 Conclusions and Future Works 73
7.1. Conclusions 73
7.2. Future Works 74
References 76
Appendix 80



[1]I. Neamatullah, M.M. Douglass, L.H. Lehman, A. Reisner, M. Villarroel, W J Long, P Szolovits, G.B. Moody, R.G. Mark, G.D. Clifford. "Automated de-identification of free-text medical records", BMC Medical Informatics and Decision Making, vol. 8, no. 32, 24, July, 2008.
[2]Tianxia Gong, Chew Lim Tan, Tze Yun Leong, Cheng Kiang Lee, Boon Chuan Pang, C. C. Tchoyoson Lim, Qi Tian, Suisheng Tang, Zhuo Zhang, "Text Mining in Radiology Reports," icdm, pp.815 -820, 2008 Eighth IEEE International Conference on Data Mining, 2008.
[3]Cheng, P.H. ; Yang, T.H. ; Chen, H.S. ; Hsu, K.P. ; Chen, S.J. & Lai, J.S. "Codesign of a Healthcare Enterprise Information Portal and Hospital Information Systems", in Plastics, 2nd ed. vol. 3, J. Peters, Ed. New York: McGraw-Hill, pp. 15-64.
[4]Hsieh, S.L.; Lai, F.; Cheng, P.H.; Chen, J.L.; Lee, H.H. & Tsai, W.N. et al. "An Integrated Healthcare Enterprise Information Portal and Healthcare Information System Framework", IEEE EMBC 2006, pp. 4731-4734.
[5]Murray, M. "An investigation of specifications for migrating to a web portal framework for the dissemination of health information within a public health network", In Proceedings of the 35th Annual Hawaii International Conference on System Sciences, pp. 1917-1925.
[6]Bunge, R., Chung, S. & Endicott-Popovsky et al. "An Operational Framework for Service Oriented Architecture Network Security", Hawaii International Conference on System Sciences, Proceedings of the 41st Annual, pp. 312 – 312.
[7]Lewis, G.A. ; Morris, E. ; Simanta, S. & Wrage, L. et al. "Common Misconceptions about Service-Oriented Architecture, Commercial-off-the-Shelf (COTS)-Based Software Systems", ICCBSS ''07, Sixth International IEEE Conference, pp. 123-130.
[8]Banko, M., Brill, E.: "Scaling to very very large corpora for natural language disambiguation." Proceedings of the 39th Annual Meeting on Association for Computational Linguistics (2001), pp.26-33.
[9]Yunbo Cao and Hang Li: 2002, "Base Noun Phrase Translation Using Web Data and the EM Algorithm", in Proc. of COLING 2002, pp. 127-133.
[10]Timothy Chklovski and Patrick Pantel. 2004.VerbOcean, "Mining the Web for Fine-Grained Semantic Verb Relations.", In Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP-04). Barcelona, Spain.
[11]D. Bollegala, Y. Matsuo, M. Ishizuka, "Measuring semantic similarity between words using web search engines.", WWW 2007, pp. 757-766.
[12]D. Carrell, D. Miglioretti, and R. Smith-Bindman, "Coding free text radiology reports using the Cancer Text Information Extraction System (caTIES)," AMIA Annu Symp Proc, pp. 889, 2007.
[13]M. H. Evans, B. W. T. Rohm, F. A. Schultz et al., "Using caTIES as a case-finding tool in tissue repositories: system challenges and lessons learned", the caBIG Annual Meeting, Washington, DC. 20-22, May, 2009.
[14]L. P. Annibal, and J. C. Felipe, "An ontology-based framework to support nonintrusive storage and analysis of radiological diagnosis data", Computer-Based Medical Systems, CBMS 2009, pp. 1-6, 2-5 Aug. 2009.
[15]A. Coden, G. Savova, I. Sominsky et al., "Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model," J Biomed Inform, vol. 42, no. 5, pp. 937-49, Oct. 2009.
[16]A. Mykowiecka, M. Marciniak, and A. Kupsc, "Rule-based information extraction from patients'' clinical data," J Biomed Inform, vol. 42, no. 5, pp. 923-36, Oct, 2009.
[17]V. Jagannathan, C. J. Mullett, J. G. Arbogast et al., "Assessment of commercial NLP engines for medication information extraction from dictated clinical notes," Int J Med Inform, vol. 78, no. 4, pp. 284-91, Apr, 2009.
[18]Grace I. Patersona, Michael Shepherdb, Xiaoli Wangb, Carolyn Wattersb, David Zitnera, "Using the XML-based Clinical Document Architecture for Exchange of Structured Discharge Summaries", Proceedings of the 35th Hawaii International Conference on System Sciences, 2002.
[19]InSook Cho and Hyeoun-Ae Park, "Development and evaluation of a terminology-based electronic nursing record system", Journal of Biomedical Informatics 36 (2003), pp. 304-312, August, 2003.
[20]Fu-Ren Lin, Shien-Chao Chou, Shung-Mei Pan, Yao-Mei Chen, "Mining Time Dependency Patterns in Clinical Pathways", Proceedings of the 33rd Hawaii International Conference on System Sciences, vol. 5, pp. 5015, 2000.
[21]Adam Wrighta and Dean F. Sittig, "Automated Development of Order Sets and Corollary Orders by Data Mining in an Ambulatory Computerized Physician Order Entry System", AMIA Annu Symp Proc. 2006; pp.819-823.
[22]M Peleg, O Ogunyemi, S Tu, AA Boxwala, Q Zeng, et al. "Using Features of Arden Syntax with Object-oriented Medical Data Models for Guideline Modeling", JOURNAL OF BIOMEDICAL INFORMATICS, 2001. pp. 523-527.
[23]TA Pryor, G Hripcsak, "The arden syntax for medical logic modules", Journal of Clinical Monitoring and Computing, vol. 10, no. 4, DOI: 10.1007/BF01133012 1993. pp. 215-224.
[24]Xueyun Sharon Wang, Ph.D., Leonard Nayda, Richard Dettinger, "Infrastructure for a clinical decision-intelligence system", IBM SYSTEMS JOURNAL, VOL 46, NO 1, 2007, pp. 151 - 169.
[25]S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis, K. Tolle, B. Amann, I. Fundulaki, M. Scholl, and A.-M. Vercoustre. "Managing RDF Metadata for Community Webs", In Proc. of the ER''00 2nd International Workshop on the World Wide Web and Conceptual Modeling (WCM''00), pages 140-151, Salt Lake City, Utah, 9-12 October 2000.
[26]D.L. McGuinness, R. Fikes, J. Rice, and S. Wilder. "An Environment for Merging and Testing Large Ontologies", In Proceedings of the 7th International Conference on Principles of Knowledge Representation and Reasoning (KR''00), Breckenridge, Colorado, USA, 12-15 April 2000.
[27]D.L. McGuinness, R. Fikes, J. Rice, and S. Wilder. "The Chimaera Ontology Environment", In Proc. of the 17th National Conference on Artificial Intelligence (AAAI''00), Austin, Texas, 30 July , 2000.
[28]LIBSVM -- A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm/
[29]Cortes, C. and V. Vapnik, "Support-vector network", Machine Learning, vol. 20, no. 3, pp. 273-297, DOI: 10.1007/BF00994018, 1995.
[30]Medicine Net.com, Health and Medical Information Produced by Doctors, http://www.medterms.com/script/main/hp.asp
[31]Synonyms.net, http://www.synonyms.net/synonym/
[32]Rudi L. Cilibrasi and Paul M.B. Vitanyi. "The Google Similarity Distance", IEEE ITSOC Information Theory Workshop 2005 on Coding and Complexity, 29th Aug. - 1st Sep., 2005,
[33]J. McCrae, N. Collier., "Synonym set extraction from the biomedical literature by lexical pattern discovery", BMC Bioinformatics, DOI: 10.1186/1471-2105-9-159, 2008.
[34]Yi-Wei Chen and Chih-Jen Lin. "Combining SVMs with Various Feature Selection Strategies", Studies in Fuzziness and Soft Computing, 2006, Volume 207/2006, pp. 315-324, DOI: 10.1007/978-3-540-35488-8_13.
[35]J. Platt. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. Advances in Large Margin Classifiers, pp. 61-74, 2000.
[36]A. Hliaoutakis, "Semantic similarity measures in MeSH ontology and their application to information retrieval on Medline", Master''s thesis, Tech. Univ. Crete, Chania, Crete, 2005.
[37]T. Pedersen, S. Pakhomov, and S. Patwardhan, "Measures of semantic similarity and relatedness in the medical domain", J. Biomed. Inf., vol. 40, no. 3, 2007.
[38]H. Al-Mubaid, and Hoa A. Nguyen. "Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies", IEEE Trans. On systems, man, and cybernetics, vol. 39, Jul. 2009.
[39]R. Rada, H. Mili, E. Bichnell, and M. Blettner, "Development and application of a metric on semantic nets," IEEE Trans. Syst., Man, Cybern., vol. 9, no. 1, pp. 17-30, Jan./Feb. 1989.
[40]C. Leacock and M. Chodorow, "Combining local context and WordNet similarity for word sense identification", in WordNet: An Electronic Lexical Database, C. Fellbaum, Ed. .ERIMENTS, MIT Press (1998) , pp. 305-332.
[41]Z. Wu and M. Palmer, "Verb semantics and lexical selection," in Proc. 32nd Annu. Meeting Assoc. Comput. Linguistics, 1994, pp. 133-138.
[42]Lin D., "An information-theoretic definition of similarity", Proc. of the 15th International Conference on Machine Learning. Madison, 1998.
[43]Resnik P. "WordNet and class-based probabilities." Fellbaum C, editor. WordNet: An electronic lexical database. Cambridge, 1998. pp. 239-63.
[44]Jiang J, Conrath D. Semantic similarity based on corpus statistics and lexical taxonomy. In: Proc. of the 10th international conference on research in computational linguistics, Taipei, Taiwan; 1997. pp. 19-33.
[45]Ted Pedersen, Serguei V.S. Pakhomov, Siddharth P., Christopher G. Chute. Measures of semantic similarity and relatedness in the biomedical domain Journal of Biomedical Informatics 40 (2007), pp.288-299.
[46]C. H. Chen, and D. S. Chen, "Hepatocellular carcinoma: 30 years'' experience in Taiwan," Journal of the Formosan Medical Association, vol. 91, pp. 187-202, 1992.
[47]Robert H. Miller and Ida Sim, "Physicians'' Use of Electronic Medical Records: Barriers and Solutions", Pursuit of Quality 2004, March/April, 2004.
[48]R. Haux, A. Winter, E. Ammenwerth, B. Brigl, "Strategic Information Management in Hospitals: An Introduction to Hospital Information Systems", Springer, New York, 2004.
[49]Sung-huai Hsieh, I-Ching Hou, Kuo-Hsuan Huang, Po-Hsun Cheng, Ching-Ting Tan, Po-Chao Shen, Kai-Ping Hsu, Sheau-Ling Hsieh, Feipei Lai, "Design and Implementation of Web-based Mobile Electronic Medication Administration Record", Journal of Medical Systems 2009, DOI: 10.1007/s10916-009-9286-3.
[50]Sung-huai Hsieh, Po-Hsun Cheng, Sheau-Ling Hsieh, Feipei Lai, "Web-based Enterprise Healthcare Information System Leveraging Service-Oriented Architecture (SOA)", unpublished.
[51]The Official Microsoft ASP .Net Site, http://www.asp.net/ajax/
[52]Jesse James Garrett, "Ajax: A New Approach to Web Applications", http://www.adaptivepath.com/ideas/essays/archives/000385.php, 2005.
[53]Andreas Stolcke, "SRILM - An Extensible Language Modeling Toolkit", Proceedings of the intelligence Conference on Spoken Language Processing, pp. 901-904, 2002.
[54]SRI International''s Speech Technology and Research (STAR) Laboratory, "SRILM - The SRI Language Modeling Toolkit", http://www.speech.sri.com/projects/srilm/
[55]Michael White, "Designing an Extensible API for Integrating Language Modeling and Realization", Proceedings of ACL-05 Workshop on Software, 2005.
[56]Le Zhang and Steve Renals, "What is Statistical Language Modeling (SLM)", http://homepages.inf.ed.ac.uk/lzhang10/slm.html.
[57]Knuth, D.E.: Retrieval on Secondary Keys. In: The art of computer programming: Sorting and Searching, pp. 560-563. Addison-Wesley, (1997).
[58]Michael, J., Maranda, PhD, Brian Gugerty, DNS, MS, RN, "CISIES: An Informatics Measurement Instrument", http://cisevaluation.com/.
[59]R.S. Ledley, L.B. Lusted, "Reasoning foundations of medical diagnosis; symbolic logic, probability, and value theory aid our understanding of how physicians reason", Science 130, 1959.
[60]Peleg M, Boxwala AA, Ogunyemi O, Zeng Q, Tu S, Lacson R, Bernstam E, Ash N, Mork P, Ohno-Machado L, Shortliffe EH, Greenes RA., "GLIF3: the evolution of a guideline representation format., Proc AMIA Symp. 2000", pp. 645-649.
[61]P. Ram, D. Berg, S. Tu, et al., "Executing clinical practice guidelines using the SAGE execution engine", Medinfo, 2004.
[62]Tu, S.W., Campbell, J.R., Glasgow, J., et al., "The SAGE Guideline Model: Achievements and Overview", J. Am. Med. Inform. Assoc. 14(5), 589-598 (2007).
[63]J. H Gennari,., M. A Musen., R Fergerson.W., Grosso, W. E., M Crubezy,H.Eriksson, Noy, N. F. and Tu, S. W.,The Evolution of Protege: An Environment for Knowledge-Based Systems Development, International Journal of Human Computer Studies,2003;58(1):89-123.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔