跳到主要內容

臺灣博碩士論文加值系統

(3.235.60.144) 您好!臺灣時間:2021/07/26 23:36
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:吳勁廷
研究生(外文):Chin-Ting Wu
論文名稱:發展一運用RNA定序資料鑑定病原體之演算法
論文名稱(外文):Development of a Fast Algorithm for Pathogen Identification through RNA-seq
指導教授:莊曜宇
口試委員:蔡孟勳賴亮全盧子彬蕭朱杏蕭自宏
口試日期:2015-07-28
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:生醫電子與資訊學研究所
學門:工程學門
學類:生醫工程學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:英文
論文頁數:54
中文關鍵詞:次世代定序RNA定序病原體
外文關鍵詞:next generation sequencingRNA-seqpathogen
相關次數:
  • 被引用被引用:0
  • 點閱點閱:113
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
如何早期診斷由病毒,細菌或是黴菌等病原體引起之感染性疾病為目前臨床研究的重大課題之一。除了傳統的菌種及病毒鑑定方式之外,隨著次世代定序技術的發展,運用次世代定序技術找尋可能的病原體為一有效的鑑定方式。世界上的研發團隊已發展了數個方法來執行來執行菌種鑑定工作。然而這些開發的演算法需耗費大量的電腦計算運算時間及運算資源,以至於在實務運用遭遇到困難。為此我們針對了病原體鑑定開發了一高效能的新穎演算法。
此的演算法使用RNA定序資料,經由四個演算步驟進而鑑定病原體之基因序列片段。首先,將定序資料對比於人類參照基因序列,並保留非人類基因序列的資料進行下一步分析;第二步將非人類基因序列,進行全新序列組裝,透過序列重疊、延伸將序列串接成長序列片段;接著利用統計分析模型將鑑定其組裝之精準度。最後我們將通過統計檢定的長片段,利用BLAST工具確認其來源物種 。
本實驗運過資訊模擬資料以及RNA-seq實驗數據進而評估本演算法之效能。模擬及真實資料的分析結果顯示,本演算法皆呈現高度的精準度與敏感度。再與其他三種演算法的比較分析結果顯示,我們開發的演算法有較高的運算效能。我們將此方法應用於子宮頸癌,肺腺癌以及大腸癌的資料組上,試圖識別可能與這些癌症可能有相關的致病原,分析結果成功地找尋到各種癌症可能相關的病原體。
總結而言,本實驗發展的新穎演算可準確且有效率的經由RNA-seq資料檢測出可能的病原體。且本方法之運算效良非常良好,可有效地工作時間,相信這個演算法的開發將有助於病原體檢測的研究發展。


The diagnostic of virus, bacterial or fungus in early stage of infectious disease has been an important issue in clinical research. Except for strain or virus identification by traditional labor-intensive in vitro experiments, in-silico methods have been developed for pathogen identification on account of the innovation of next-generation sequencing. Research groups over the world have developed several methods. However, these in-silico methods are still time-consuming and compute-intensive, so that they occur practical obstacles. To address these issues, we developed an accurate and efficient algorithm for pathogen identification.
Here we presented a novel algorithm to identify pathogens in four algorithmic steps through RNA-seq. First, the reads of sequences were aligned to the reference genome of human and those unable to be aligned were retained for subsequent analysis; Secondly, the retained reads were assembled to construct contigs of pathogens by repeated region of retained reads; Next, a statistical model was applied to the putative transcript contigs to remove fake contigs resulting from random assembly. We then applied BLAST to the contigs that passed the statistical test to identify the species and strains of the pathogens.
To evaluate the performance, we adopted both simulation and real data sets that contains samples with pathogen infections. The results of both simulation and real data show that our algorithm have high sensitivity and accuracy. We compared our method with the other three methods and demonstrated that algorithm we developed has higher effectiveness. Furthermore, we also applied our method to the cervical cancer, lung adenocarcinoma and colorectal cancer dataset for identifying possible pathogens associated with these three kinds of cancers.
In summary, our method is accurate and effective in detecting pathogens using RNA-seq data from patient samples. Moreover, the efficiency and short working time of our proposed method has enabled the use of large data set in pathogenic studies.


CONTENTS

誌謝..................................................................................................................................................................I
中文摘要 III
ABSTRACT V
Chapter 1 Introduction 1
1.1 Background 1
1.1.1 Infectious disease and the Pathogen identification 1
1.1.2 Utilization of sequencing technology for pathogen identification 2
1.2 Background survey 3
1.2.1 Pathseq 4
1.2.2 RINS 5
1.2.3 CaPSID 6
1.2.4 Virana 7
1.3 Motivation and specific aims 7
Chapter 2 Materials and Methods 9
2.1 Reads alignment 11
2.2 Sequence assembly 11
2.3 Fake contigs removal 12
2.4 Species identification 14
2.5 Calculation of relative error and accuracy 15
2.6 Data preprocess and simulation 17
2.6.1 Data preprocess 17
2.6.2 Human RNA-seq data simulation 17
2.6.3 Pathogen RNA-seq data simulation 18
2.7 Datasets 19
2.7.1 Lung tissue infected by Varicella-Zoster Virus 20
2.7.2 Cervical cancer datasets 20
2.7.3 Lung adenocarcinoma datasets 21
2.7.4 Metagenomic study of colorectal cancer 21
Chapter 3 Results 23
3.1 Distribution modeling 23
3.2 Performance evaluation 25
3.3 Method comparison 27
3.4 RNA-seq of cell line with virus infection 29
3.5 Application of cervical cancer 30
3.5 Application of lung adenocarcinoma 34
3.6 Application of colorectal cancer 36
Chapter 4 Discussion 39
4.1 RNA extraction protocol 39
4.2 Model reliability 39
4.3 Method comparison 41
4.4 Limitation of Patho-finder 41
4.5 Results Interpretation 43
4.6 Metagenomic study 44
Chapter 5 Conclusion 46
Chapter 6 References 47



1.Moore, P.S. and Y. Chang, Why do viruses cause cancer? Highlights of the first century of human tumour virology. Nature reviews. cancer, 2010. 10(12): p. 878-889.
2.Sarid, R. and S.-J. Gao, Viruses and human cancer: from detection to causality. Cancer letters, 2011. 305(2): p. 218-227.
3.Boshoff, C. and R. Weiss, Kaposi''s sarcoma-associated herpesvirus. Advances in cancer research, 1998. 75: p. 57-87.
4.Walboomers, J.M., et al., Human papillomavirus is a necessary cause of invasive cervical cancer worldwide. The journal of pathology, 1999. 189(1): p. 12-19.
5.Mineta, H., et al., Human papilloma virus (HPV) type 16 and 18 detected in head and neck squamous cell carcinoma. Anticancer research, 1997. 18(6B): p. 4765-4768.
6.Perz, J.F., et al., The contributions of hepatitis B virus and hepatitis C virus infections to cirrhosis and primary liver cancer worldwide. Journal of hepatology, 2006. 45(4): p. 529-538.
7.Shibata, D. and L.M. Weiss, Epstein-Barr virus-associated gastric adenocarcinoma. The American journal of pathology, 1992. 140(4): p. 769.
8.Gunvén, P., et al., Epstein-Barr virus in Burkitt''s lymphoma and nasopharyngeal carcinoma.[i] Antibodies to EBV associated membrane and viral capsid antigens in Burkitt lymphoma patients. Nature, 1970. 228: p. 1053-6.
9.Lipkin, W.I., Microbe hunting. Microbiology and molecular biology reviews, 2010. 74(3): p. 363-377.
10.Didelot, X., et al., Transforming clinical microbiology with bacterial genome sequencing. Nature reviews genetics, 2012. 13(9): p. 601-612.
11.Chen, E.C., et al., Using a pan-viral microarray assay (Virochip) to screen clinical samples for viral pathogens. Journal of visualized experiments: JoVE, 2011(50).
12.Lanciotti, R., et al., Origin of the West Nile virus responsible for an outbreak of encephalitis in the northeastern United States. Science, 1999. 286(5448): p. 2333-2337.
13.Kuroda, M., et al., Characterization of quasispecies of pandemic 2009 influenza A virus (A/H1N1/2009) by de novo sequencing using a next-generation DNA sequencer. PLoS one, 2010. 5(4): p. e10256.
14.Greninger, A.L., et al., A metagenomic analysis of pandemic influenza A (2009 H1N1) infection in patients from North America. PloS one, 2010. 5(10): p. e13381.
15.Deng, Y.-M., N. Caldwell, and I.G. Barr, Rapid detection and subtyping of human influenza A viruses and reassortants by pyrosequencing. PLoS one, 2011. 6(8): p. e23400.
16.Chin, C.-S., et al., The origin of the Haitian cholera outbreak strain. New England journal of medicine, 2011. 364(1): p. 33-42.
17.Frank, C., et al., Epidemic profile of Shiga-toxin–producing Escherichia coli O104: H4 outbreak in Germany. New England journal of medicine, 2011. 365(19): p. 1771-1780.
18.Rohde, H., et al., Open-source genomic analysis of Shiga-toxin–producing E. coli O104: H4. New England journal of medicine, 2011. 365(8): p. 718-724.
19.Turner, M., Microbe outbreak panics Europe. Nature, 2011. 474(7350): p. 137-137.
20.Lienau, E.K., et al., Identification of a salmonellosis outbreak by means of molecular sequencing. New England journal of medicine, 2011. 364(10): p. 981-982.
21.Feng, H., et al., Clonal integration of a polyomavirus in human Merkel cell carcinoma. Science, 2008. 319(5866): p. 1096-1100.
22.Snitkin, E.S., et al., Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing. Science translational medicine, 2012. 4(148): p. 148ra116-148ra116.
23.Rothberg, J.M., et al., An integrated semiconductor device enabling non-optical genome sequencing. Nature, 2011. 475(7356): p. 348-352.
24.Borozan, I., et al., CaPSID: a bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes. BMC bioinformatics, 2012. 13: p. 206.
25.Kostic, A.D., et al., PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nature biotechnology, 2011. 29(5): p. 393-396.
26.Bhaduri, A., et al., Rapid identification of non-human sequences in high-throughput sequencing datasets. Bioinformatics (Oxford, England), 2012. 28(8): p. 1174-1175.
27.Schelhorn, S.-E., et al., Sensitive detection of viral transcripts in human tumor transcriptomes. PLoS comput biol, 2013. 9: p. e1003228.
28.Francis, O.E., et al., Pathoscope: species identification and strain attribution with unassembled sequencing data. Genome research, 2013. 23(10): p. 1721-1729.
29.Naeem, R., M. Rashid, and A. Pain, READSCAN: a fast and scalable pathogen discovery program with accurate genome relative abundance estimation. Bioinformatics, 2013. 29(3): p. 391-392.
30.Xu, G., et al., RNA CoMPASS: a dual approach for pathogen and host transcriptome analysis of RNA-seq datasets. 2014.
31.Kostic, A.D., et al., Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome research, 2012. 22(2): p. 292-298.
32.Bhatt, A.S., et al., Sequence-based discovery of Bradyrhizobium enterica in cord colitis syndrome. New England journal of medicine, 2013. 369(6): p. 517-28.
33.Chan, J.Z.M., et al., Genome sequencing in clinical microbiology. Nature biotechnology, 2012. 30(11): p. 1068-1071.
34.Dunne Jr, W.M., L.F. Westblade, and B. Ford, Next-generation and whole-genome sequencing in the diagnostic clinical microbiology laboratory. European journal of clinical microbiology & infectious diseases, 2012. 31(8): p. 1719-1726.
35.Walker, M.J. and S.A. Beatson, Outsmarting outbreaks. Science, 2012. 338(6111): p. 1161-1162.
36.Török, M.E. and S.J. Peacock, Rapid whole-genome sequencing of bacterial pathogens in the clinical microbiology laboratory—pipe dream or reality? Journal of antimicrobial chemotherapy, 2012: p. dks247.
37.Li, H., J. Ruan, and R. Durbin, Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome research, 2008. 18(11): p. 1851-1858.
38.Li, H. and R. Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 2009. 25(14): p. 1754-1760.
39.Altschul, S.F., et al., Basic local alignment search tool. Journal of molecular biology, 1990. 215(3): p. 403-410.
40.Kent, W.J., BLAT--the BLAST-like alignment tool. Genome research, 2002. 12(4): p. 656-64.
41.Langmead, B., et al., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology, 2009. 10(3): p. R25.
42.Grabherr, M.G., et al., Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature biotechnology, 2011. 29(7): p. 644-652.
43.Skinner, M.E., et al., JBrowse: a next-generation genome browser. Genome research, 2009. 19(9): p. 1630-1638.
44.Fujita, P.A., et al., The UCSC genome browser database: update 2011. Nucleic acids research, 2010: p. 963.
45.Pruitt, K.D., et al., NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic acids research, 2012. 40(D1): p. D130-D135.
46.Hercus, C., Novoalign. 2009.
47.Dobin, A., et al., STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 2013. 29(1): p. 15-21.
48.Kim, D., et al., TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome biology, 2013. 14(4): p. R36.
49.Langmead, B. and S.L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nature methods, 2012. 9(4): p. 357-359.
50.Engström, P.G., et al., Systematic evaluation of spliced alignment programs for RNA-seq data. Nature methods, 2013. 10(12): p. 1185-1191.
51.Wang, W.-A., et al. Comparisons and performance evaluations of RNA-seq alignment tools. 2014. IEEE.
52.Marçais, G. and C. Kingsford, A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 2011. 27(6): p. 764-770.
53.Benson, D.A., et al., GenBank. Nucleic acids research, 2000. 28(1): p. 15-18.
54.Goujon, M., et al., A new bioinformatics analysis tools framework at EMBL–EBI. Nucleic acids research, 2010. 38(suppl 2): p. W695-W699.
55.Sugawara, H., et al., DDBJ with new system and face. Nucleic acids research, 2008. 36(suppl 1): p. D22-D24.
56.Pruitt, K.D., T. Tatusova, and D.R. Maglott, NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic acids research, 2007. 35(suppl 1): p. D61-D65.
57.Leinonen, R., H. Sugawara, and M. Shumway, The sequence read archive. Nucleic acids research, 2010: p. gkq1019.
58.Andrews, S., FastQC: A quality control tool for high throughput sequence data. Reference Source, 2010.
59.Gordon, A. and G.J. Hannon, Fastx-toolkit. FASTQ/A short-reads preprocessing tools (unpublished) http://hannonlab. cshl. edu/fastx_toolkit, 2010.
60.Griebel, T., et al., Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic acids research, 2012. 40(20): p. 10073-10083.
61.Newman, M.E.J., Power laws, Pareto distributions and Zipf''s law. Contemporary physics, 2005. 46(5): p. 323-351.
62.Baird, N.L., et al., Comparison of Varicella-Zoster Virus RNA Sequences in Human Neurons and Fibroblasts. Journal of virology, 2014. 88(10): p. 5877-5880.
63.Ju, Y.S., et al., A transforming KIF5B and RET gene fusion in lung adenocarcinoma revealed from whole-genome and transcriptome sequencing. Genome research, 2012. 22(3): p. 436-445.
64.Cui, P., et al., A comparison between ribo-minus RNA-sequencing and polyA-selected RNA-sequencing. Genomics, 2010. 96(5): p. 259-265.
65.Bertelsen, B.I., et al., HPV subtypes in cervical cancer biopsies between 1930and 2004: detection using general primer pair PCR and sequencing. Virchows Archiv, 2006. 449(2): p. 141-147.
66.Stoler, M.H., et al., Human papillomavirus type 16 and 18 gene expression in cervical neoplasias. Human pathology, 1992. 23(2): p. 117-128.
67.Goodman, A.L. and J.I. Gordon, Our unindicted coconspirators: human metabolism from a microbial perspective. Cell metabolism, 2010. 12(2): p. 111-116.
68.Rezazadeh, A., et al., The role of human papilloma virus in lung cancer: a review of the evidence. The American journal of the medical sciences, 2009. 338(1): p. 64-67.
69.Chen, Y.-C., et al., Lung adenocarcinoma and human papillomavirus infection. Cancer, 2004. 101(6): p. 1428-1436.
70.Castellarin, M., et al., Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome research, 2012. 22(2): p. 299-306.
71.Lodge, R., et al., MuLV-based vectors pseudotyped with truncated HIV glycoproteins mediate specific gene transfer in CD4+ peripheral blood lymphocytes. Gene therapy, 1998. 5(5): p. 655-664.
72.Podschun, R. and U. Ullmann, Klebsiella spp. as Nosocomial Pathogens: Epidemiology, Taxonomy, Typing Methods, and Pathogenicity Factors. Clinical microbiology review, 1998. 11(4): p. 589-603.
73.McCartney, C., A. Moghadam, and K.B. Sriram, Lung adenocarcinoma masquerading as refractory Klebsiella pneumoniae. BMJ case report, 2014. 2014.
74.Enache-Angoulvant, A. and C. Hennequin, Invasive Saccharomyces infection: a comprehensive review. Clinical infectious disease, 2005. 41(11): p. 1559-68.
75.Tawfik, O.W., et al., Saccharomyces cerevisiae pneumonia in a patient with acquired immune deficiency syndrome. Journal of clinical microbiology, 1989. 27(7): p. 1689-91.
76.Powell, H.A., et al., Chronic obstructive pulmonary disease and risk of lung cancer: the importance of smoking and timing of diagnosis. Journal of thorac oncology, 2013. 8(1): p. 6-11.
77.Marchesi, J.R., et al., Towards the human colorectal cancer microbiome. PLoS one, 2011. 6(5): p. e20447.



QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top