跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.11) 您好!臺灣時間:2025/09/23 18:12
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:蔡承洧
研究生(外文):Tsai,Cheng-Wei
論文名稱:以長序列定序技術拼接已初步組合之片段
論文名稱(外文):Scaffolding Pre-Assembled Contigs Using Long-Read Sequencing
指導教授:黃耀廷
指導教授(外文):Huang,Yao-Ting
口試委員:黃耀廷莊樹諄蔡懷寬
口試委員(外文):Huang,Yao-TingChuang,Trees-JuenTsai,Huai-Kuang
口試日期:2013-07-30
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:33
中文關鍵詞:基因體重組第三代定序
外文關鍵詞:assemblythird-generation sequencing
相關次數:
  • 被引用被引用:0
  • 點閱點閱:432
  • 評分評分:
  • 下載下載:6
  • 收藏至我的研究室書目清單書目收藏:0
在最近幾年第三代定序 (Third-Generation Sequencing) 技術的實驗平台已經已經被應用於改善基因體的重組,此技術可即時定序DNA單分子並產出比以往定序技術較長的的序列。但是不幸地這些長序列和先前的定序技術相較之下有較高的錯誤率,其主要的錯誤是小插入/刪除 (indel)。高錯誤率大大減少長序列在改善基因體重組 (Assembly)上的使用性。在本篇論文中,我們設計並撰寫一個以Pacific Biosciences之長序列定序技術拼接已初步組合之片段的軟體,將其命名為SACLR。SACLR用群集序列比對的方法來容忍定序平台的錯誤,並以此方法來決定已初步組合之片段 (pre-assembled contigs) 在長序列 (long read) 上的邊界位置。因此這些片段就能建立相互間的連接關係且合併起來以改善拼接序列長度。值得一提的是這些拼接序列 (scaffold) 中的組合缺口可以直接用長序列填充,且這些拼接序列可以利用長序列更進一步組出更完整的序列。在多組真實資料測試下, 實驗結果顯示SACLR可以產出更連續且正確的拼接序列。
In recent years, third-generation sequencing platform has been applied for improving genome assembly, which is able to sequence a single DNA molecular in real time and generate reads with longer length. But unfortunately, these long reads are often with higher error rates compared with previous sequencing technologies, in which most errors are indels. The high error rates greatly reduce the usability of long reads for improving genome assembly. In this thesis, we design and implement a program for scaffolding pre-assembled contigs using long reads (called SACLR) generated by Pacific Biosciences platform. Given a set of pre-assembled contigs and long reads, SACLR determines the mapped boundary of contigs using a novel clustering alignment approach for tolerating various errors of the platform. The linkage between contigs across multiple long reads is established and integrated for further improving the scaffolding length. It is worth mentioning that the gaps within our scaffolds can be directly filled and the two ends of each scaffold may be further extended by long reads. SACLR has been tested using a variety of real data sets. The experimental results showed that SCALR produced more contiguous and accurate sequences.
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Introduction to Next Generation Sequencing. . . . . . . . . . . . . . 4
2.2 Introduction to De nove Assembly. . . . . . . . . . . . . . . . . . . 5
2.3 Introduction to Pacfic Biosciences long-read. . . . . . . . . . . . . 7
3 Material and Method . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 Determination of Boundaries of Contigs within Long Reads. . . . . . . 10
3.1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 Optimization of Maximum Interval Sum of Two Arrays and Filtration . 12
3.2 Construction and Simplification of a Scaffolding Graph. . . . . . . . 13
3.2.1 Scaffolding Graph Constructed by Multiple Long Reads . . . . . . . 13
3.2.2 Path Traversal of Scaffolding Graph . . . . . . . . . . . . . . . . 14
3.2.3 Simplification of Scaffolding Graph . . . . . . . . . . . . . . . . 15
3.3 Gap Closure via Long Reads. . . . . . . . . . . . . . . . . . . . . . 17
4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1 Improvement of SACLR Using Low-Coverage PacBio Sequencing . . . . . . 19
4.2 Result of Different . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Result of Scaffold Accuracy . . . . . . . . . . . . . . . . . . . . . 21
5 Conclusion and Future Works . . . . . . . . . . . . . . . . . . . . . . 24
5.1 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 FutureWorks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
A Supplementary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30


1. Celera compiles dna sequence. Australas Biotechnol, 10:20–21, 2000.
2. A M Maxam, W Gilbert. A new method for sequencing dna. Proc Natl Acad Sci U S A, 74(2):560–564, 2 1977.
3. AG Clark, TS Whittam. Sequencing eerror and molecular evolutionary analysis. Mol Biol Evol, 9(4):744–752, 1992.
4. Beifang Niu, Limin Fu, Shulei Sun, et al. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics, 11, 2010.
5. David R Bentley. Whole-genome re-sequencing. Current Opinion in Genetics & Development, 16:545–552, 6 2006.
6. Dalloul RA, Long JA, Zimin AV, et al. Multi-platform next-generation sequencing of the domestic turkey(meleagris gallopavo): genome assembly and analysis. PLoS Biol, 8, 9 2010.
7. Daniel R Zerbino and Ewan Birney. Velvet: Algorithms for de novo short read assembly using de bruijn graphs. Genome Research, 18(5):821–829, 5 2008.
8. David A Wheeler, Maithreyan Srinivasan,Michael Egholm, et al. The complete genome of an individual by massively parallel dna sequencing. Nature, 452(7189):872–876, 4 2008.
9. E Yeramian, H Buc. Tandem repeats in complete bacterial genome ssequence:sequence and structural analyses for comparative studies. Res Microbiol, 150:745–754, 1999.
10. F Sanger, S Nicklen, and A R Coulson. Dna sequencing with chainterminating inhibitors. Proc Natl Acad Sci U S A, 74(12):5463–5467, 12 1977.
11. FS Collins, ES Lander, J Roger, et al. Finishing the euchromatic sequence of the human genome. Nature, 431:931–945, 10 2004.
12. Gennady Denisov, BrianWalenz, Aaron L. Halpern, et al. Consensus generation and variant detection by celera assembler. Bioinformatics, 24(8):1035–1040, 2008.
13. Graham Wiley, Simone Macmil, Chunmei Qu, et al. Methods for generating shotgun and mixed shotgun/paired-end libraries for the 454 dna sequencer. Current Protocols in Human Genetics, 5 2009.
14. J Butler, I MacCallum, M Kleber, et al. Allpaths: De novo assembly of whole-genome shotgun microreads. Genome Research, 2010.
15. Jared T Simpson and Richard Durbin. Efficient de novo assembly of large genomes using compressed data structures. Genome Research, 9 2011.
16. Jared T Simpson, Kim Wong, Shaun D Jackman, et al. Abyss: A parallel assembler for short read sequence data. Genome Research, 19(6):1117–1123, 2 2009.
17. Jason R. Miller, Sergey Koren, and Granger Sutton. Assembly algorithms for next-generation sequencing data. Genomics, 95(6):315–327, 1 2010.
18. JC Dohm, C Lottaz, T Borodina, et al. Substantial biases in ultra-short read data sets from high-throughput dna sequencing. Nucleic Acids Res, 36(16), 9 2008.
19. JC Mullikin,Z Ning. The phusion assembler. Genome Research, 13(1):81–90, 1 2003.
20. and Zemin Ning Joseph Henson, German Tischler &. Next-generation sequencing and large genome assemblies. Pharmacogenomics, 13:901–915, 6 2012.
21. Lin Liu, Yinhu Li, Siliang Li, et al. Comparison of next-generation sequencing systems. Journal of Biomedicine and Biotechnology, 2 2012.
22. Elaine R. Mardis. The impact of next-generation sequencing technology on genetics. Trends in Genetics, 24:133–141, 3 2008.
23. Elaine R. Mardis. Next-generation dna sequencing methods. Annual Review of Genomics and Human Genetics, 9:387–402, 9 2008.
24. Michael A Quail, Miriam Smith, Paul Coupland et al. . A tale of three next generation sequencing platforms: comparison of ion torrent, pacific biosciences and illumina miseq sequencers. BMC Genomics, 13:341, 7 2012.
25. Michael L Metzker. Sequencing technologies - the next generation. Nature Reviews Genetics, 11, 1 2010.
26. Olivier Harismendy,CNgPauline, Robert L Strausberg, et al. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biology, 10, 3 2009.
27. P Havlak, R Chen, KJ Durbin, et al. The atlas genome assembly system. Genome Research, 14(4):721–732, 4 2004.
28. Ruiqiang Li,Wie Fan, Geng Tian, et al. The sequence and de novo assembly of the giant panda genome. Nature, pages 311–317, 1 2010.
29. Ruiqiang Li, Yingrui Li, Karsten Kristiansen and Jun Wang. Soap: short oligonucleotide alignment program. Bioinformatics, 24(5):713–714, 1 2008.
30. S Batzoqlou,DB Jaffe,Stanley K, et al. Arachne: a whole-genome shotgun assembler. Genome Research, 12(1):177–189, 1 2002.
31. S Schwartz, WJ Kent, A Smit, et al. Humanvmouse alignment with blastz. Genome Research, 13(1):103–107, 2003.
32. Sergey Koren, Gregory P Harhay, Timothy PL Smith, et al. Reducing assembly complexity of microbial genomes with single-molecule sequencing. Quantitative Biology, 4 2013.
33. Sergey Koren, Michael C Schatz, Brian P Walenz, et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotechnology, 30:693–700, 7 2012.
34. Stephen F. Altschul,Warren Gish,Webb Miller, et al. Basic local alignment search tool. J Mol Biol., 215:403–410, 10 1990.
35. VA Pevzner, HX Tang, MS Waterman. An eulerian path approach to dna fragment assembly. P Natl Acad Sci USA, 98(17):9748–9753, 2001.
36. X Huang, JWang, S Aluru, et al. Pcap: a whole-genome assembly program. Genome Research, 13(9):2164–2170, 9 2003.
37. Z Li, Y Chen, DMu, et al. Comparison of the two major classes of assembly algorithms: overlap-layout-cousensus and de-bruijn-graph. Brief Funct Genomics, 11(1):25–37, 2012.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top