跳到主要內容

臺灣博碩士論文加值系統

(35.172.136.29) 您好!臺灣時間:2021/07/29 07:37
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:吳明侃
研究生(外文):Ming-Kan Wu
論文名稱:建構以國際疾病編碼分類遺傳疾病系統雛型:以編碼第二章為例
論文名稱(外文):A prototype classification schema of ICD-genetic disorder system: Chapter 2 neoplasm as an example
指導教授:林文昌林文昌引用關係
指導教授(外文):Wen-Chang Lin
學位類別:碩士
校院名稱:國立陽明大學
系所名稱:生物醫學資訊研究所
學門:工程學門
學類:生醫工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:145
中文關鍵詞:文字探勘疾病分類編碼人類遺傳疾病資料庫
外文關鍵詞:OMIMICD-10Text mining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:379
  • 評分評分:
  • 下載下載:28
  • 收藏至我的研究室書目清單書目收藏:0
OMIM (Online Mendelian Inheritance in Man)為目前世界上常用的人類遺傳疾病資料庫,是由McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine編輯之線上文獻資料庫。在現今的遺傳醫學研究中,它提供了查詢與人類遺傳疾病有關的基因及疾病徵狀資料。而ICD-10(International Classification of Diseases version 10)則為WHO經由43個國家衛生組織人工編輯而成的疾病分類編碼。目的在於提供醫護人員對於醫療過程中標準化的分類依據。與前一版本相比(ICD-9),ICD-10相對的增加了更多有關臨床醫學上的疾病編碼。
到目前為止的研究中,以OMIM為基礎的資料分類工具多半是建構在以系統為單位的情形上。在這樣的基礎上,使用者必須熟知各種相關基礎醫療知識與疾病之關係,對於醫護人員在臨床照護上的即時性幫助之效果有限。而ICD編碼在現有醫療體系較為完整的國家中,占有醫院資訊系統分類標準化之大部分比例。簡而言之,若能經由ICD為OMIM資料庫之分類標準,對於醫護人員而言,可直接使用病歷上相關之ICD編碼查詢OMIM資料,藉此幫助醫療相關決策。而遺傳疾病研究人員也可藉此查詢其感興趣之遺傳疾病及基因,得到快速的醫院標準化編碼。
本研究想經由自動化整合OMIM及ICD-10建立資料庫,提供相關查詢工具。希望能夠經由與以往不同之分類方法,提供使用者更準確方便之查詢功能及不同檢視OMIM資料之觀點。全系統藉由文字探勘方式處理OMIM資料,並藉由電腦自動化擷取關鍵字將各OMIM編碼下資料以ICD-10編碼做分類。在兩項資料庫更新後,不需再經由人工作整理,只需重新更新資料庫即可得到更新的內容。
建立完資料庫後,遵照資料庫內編碼排序及OMIM資料庫中各項資料名稱(gene name,disorder name)建立查詢工具及視覺化介面。未來並依照ICD編碼及gene為單位,統計各項資料出現頻率進行分群,提供進一步之統計資料,以期能夠達到深入之遺傳資訊。
Online Mendelian Inheritance in Man (OMIM™), a well-known online database of human genetic disorders provides a interactive query service for disease disorders and genes related human genetic diseases. On the other hand, International Classification of Diseases version 10 (ICD-10) is a coding schema of diseases and signs, symptoms, abnormal findings, complaints, social circumstances and external causes of injury or diseases, as classified by 43 countries at the World Health Organization (WHO). This new code set provides medical process classification standard, and allows a significant expansion on more clinical codes available than the previous ICD-9 classification.
Nowadays, almost all disease related genomic research classification tools are built according to the human body physiological systems, a.d OMIM dataset. Therefore, these tools are not oriented for clinical medical professionals, not to mention the user interfaces and data structures. However, ICD system has been widely used by most modern countries, and offers a true medical information classification standard in Hospitals. Therefore, merging the ICD schema as the classification standard for OMIM disease gene dataset might be a novel and efficient way to provide clinical medical researchers a way for browsing OMIM information with clinical relevance and also to help basic biological scientists learning the clinical significance of genes they are interested in through the ICD code integration.
This study would like to construct such a database by automatically integrating the rich OMIM information and the classification standard of ICD-10, and to provide user-friendly search and query tools. By using an innovative classification method, this database could provide user more relevant search information and different perspectives through the integration of OMIM-ICD knowledge. OMIM was processed by text mining techniques, and the keywords were tokenized and collected for subsequent automatically classification using the ICD-10 schema. Following the future OMIM and ICD-10 databases update schedule, an automated bioinformatic pipeline was established and the user could then obtain the newest knowledge without any additional manual intervention step. In addition to diseases and genes statistics, searching tools and interface visualizations will be established in the near future for basic research scientists and clinical professionals to provide a new OMIM-ICD knowledge integration database.
誌謝 i
摘要 ii
Abstract iii
目錄 iv
圖目錄 vii
表目錄 viii
表目錄 viii
第一章 緒論 1
1.1 研究動機 1
1.2 研究目的 2
1.3 研究背景 4
1.3.1 遺傳 4
1.3.2 遺傳疾病 5
1.3.3 臨床疾病分類 7
1.3.4 癌症與遺傳 8
第二章 文獻探討 9
2.1 資料庫文獻 9
2.2.1 NCBI(National Center for Biotechnology Information) 9
2.2.2 OMIM 9
2.2 疾病分類編碼文獻 11
2.2.1 世界衛生組織(World Health Organization, WHO) 11
2.2.2 國際疾病分類編碼(International Classification of Diseases, ICD) 11
2.2.3 國際疾病分類編碼第十版, ICD-10 12
2.3 以OMIM為目標之分類工具 13
2.3.1 CGMIM 13
2.3.2 GFINDer 13
2.3.3 單一遺傳疾病相關網站 14
第三章 研究方法 15
3.1 自然語言處理 (nature language processing) 15
3.2 文字探勘 (text mining) 17
3.3 研究架構 18
3.4 資料來源之詳細介紹及現況 19
3.4.1 系統包含字庫 19
3.4.2 OMIM資料 20
3.4.3 ICD 資料 21
3.5 資料前處理 22
3.5.1資料分割 22
3.5.2 去除廢字 22
3.5.3關鍵字(keyword)擷取 22
3.5.4關鍵詞(subordinate keyword)擷取 22
3.5.5交叉效度(cross validation)訓練 23
3.5.6字庫結構階層化 23
3.6 OMIM-ICD配對 25
3.6.1 ICD編碼資訊擷取 25
3.6.2 OMIM關鍵字詞及ICD關鍵字詞配對 27
3.7 本體資料整理 29
3.7.1 配對頻率統計 29
3.7.2配對評分 29
第四章 系統架構 30
4.1 架構設計 30
4.2 資料庫連結 30
4.2.1資料表狀況 30
4.2.2資料表連結 31
4.3 搜尋工具 31
4.3.1樹狀結構點選搜尋 31
4.3.2文字輸入搜尋 32
4.4視覺化介面實作 32
第五章 結果與討論 33
5.1 結果 33
5.1.1資料前處理 33
5.1.2資料分佈 36
5.1.3驗證 36
5.2 討論 38
5.2.1與CGMIM資料分佈狀況比較 38
5.2.2利用ICD分類提供系統化橫向縱向OMIM查詢 39
5.2.3比對評分條件 39
5.2.4資料更新 39
第六章 未來研究方向及結論 40
參考文獻 41
Bajdik CD, Kuo B, Rusaw S, Jones S, Brooks-Wilson A. (2005). CGMIM: automated text-mining of Online Mendelian Inheritance in Man (OMIM) to identify genetically-associated cancers and candidate genes. (Cancer Control Research Program, BC Cancer Agency, 600 West 10th Avenue, Vancouver BC).
Barbara Jo White, Jenny Fortier, Danial Clapper, Pierre Grabolosa (2007). The impact of domain-specific stop-word lists on ecommerce website search performance. Journal of Strategic E-Commerce 5, 1-2
Christopher D., Manning and Hinrich Schütze (1999). Foundations of Statistical Natural Language Processing. Promotional Web Site for the Book (MIT Press).
Daniel Jurafsky and James H. Martin (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
E. F. Codd (1970). A relational model of data for large shared data banks. Communications of the ACM 13 , 377 - 387
Kim JD, Ohta T, Tateisi Y, Tsujii J. (2003). GENIA corpus—a semantically annotated corpus for bio-textmining. Bioinformatics 19, 180–182
Masseroli M., Martucci D., Pinciroli F. (2004). GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining. Nucleic Acids Research 33, 717-723
Michael Stonebraker (1981). Operating system support for database management. Communications of the ACM 24 , 412 - 418
P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, T. G. Price (1979). Access path selection in a relational database management system. International Conference on Management of Data Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Pasini B., Ceccherini I., Romeo G. (1996). RET mutations in humna disease. Trend Genet. 12, 138-44.
Radice P., Pierotti MA. (1997). Molecular genetics of breast cancer. Nucl. Med. 41, 189-99.
Richard Madden , Catherine Sykes, Bedirhan Ustun (2008). WHO-FIC definition, scope and purpose. (WHO)
Ronen Feldman, Moshe Fresko, Yakkov Kinar, Yehuda Lindell, Orly Liphstat, Martin Rajman, Yonatan Schler, Oren Zamir (1998). Text Mining at the Term Level. Lecture Notes In Computer Science 1510, 65-73
陳麗華 (2008). Briefing & Promoting ICD-10 in Taiwan. (行政院衛生署統計室).
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top