跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.213) 您好!臺灣時間:2025/11/07 13:16
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:黃亦晨
研究生(外文):Yi-Chen Huang
論文名稱:針對基因組裝中欠缺的間隙區域以標記性長片段做填補
論文名稱(外文):Gap-Centered Local Assembly: Gap Filling in Genome Drafts with Linked Reads
指導教授:趙坤茂
口試委員:何建明林仲彥
口試日期:2019-07-29
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:生醫電子與資訊學研究所
學門:工程學門
學類:生醫工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:41
中文關鍵詞:基因體組裝基因體間隙間隙填補GemCode 平台Barcode 序列
DOI:10.6342/NTU201902537
相關次數:
  • 被引用被引用:0
  • 點閱點閱:149
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
次世代定序(Next Generation Sequencing)因為定序讀長與組裝技術的限制,導致基因體初稿中序列之內或之間產生間隙序列(Gaps),亦即一段未知、尚未被確認的序列。透過解決間隙序列,將可以在組裝中找到更多的基因,不僅可以提高組裝中的基因含量,更能夠達到更完整的基因體組裝。本論文使用 10x Genomics公司所開發之技術,將 Barcode 標記結合短序列定序以獲取長片段訊息(Linked Reads);嘗試利用其資料特性,因應以往棘手的重複序列(Repeats)問題,透過解析基因體組裝中的間隙序列,進行填補與修復。實驗結果顯示,針對日本鰻組裝中 11.8%的間隙區域有效的下降到 4.5%,並且大幅提高了組裝中的基因含量。
De novo assembly of short reads that accompanies with scaffolding algorithm often produces lots of gaps due to a lack of information to determine the sequences between contigs which leads to an incomplete draft of genomes. Here, we present a protocol for filling the gaps in genome that utilizes linked-read information contained in barcoded short reads. We validate our protocol with Japanese eel genome and show how the gaps can be filled using 10x Chromium linked reads with our local assembly methods. We expect this gap-filling protocol can be utilized for reaching more complete and high-quality scaffolds in genome assembly drafts.
口試委員會審定書...........................................................................................................#
誌謝....................................................................................................................................i
中文摘要.......................................................................................................................... ii
ABSTRACT .................................................................................................................... iii
CONTENTS .....................................................................................................................iv
LIST OF FIGURES..........................................................................................................vi
LIST OF TABLES.......................................................................................................... vii
Chapter 1 Introduction..............................................................................................1
Chapter 2 Methods and Materials............................................................................6
2.1 Basic Idea .......................................................................................................6
2.2 Strategies of contig construction ....................................................................6
2.2.1 Scaffold-based bin algorithm ................................................................8
2.2.2 Gap-based bin algorithm.......................................................................8
2.2.3 Window-based bin algorithm..............................................................10
2.2.4 Alternative strategy: De novo assembly..............................................11
2.3 Gap Filling Pipeline......................................................................................12
2.3.1 Data Preprocessing..............................................................................12
2.3.2 Contig Construction ............................................................................13
2.3.3 Gap Filling Algorithm.........................................................................14
2.3.4 Data Sources........................................................................................17
Chapter 3 Results.....................................................................................................19
3.1 Barcoded Linked Reads Preprocessing ........................................................19
3.2 Test the Gap Filling on Four-scaffold Testset...............................................20
3.2.1 Barcode Selection................................................................................20
3.2.2 Effects of mark duplicate ....................................................................22
3.2.3 Partition Algorithms............................................................................26
3.3 Results on L95 Set of Eel Assembly ............................................................28
3.3.1 Gap Filling Result ...............................................................................28
3.3.2 Biological Quality Assessment ...........................................................31
Chapter 4 Discussion................................................................................................34
Chapter 5 Conclusion ..............................................................................................38
REFERENCE ..................................................................................................................39
[1] Kececioglu, J.D., and Myers, E. W. (1995). Combinatorial algorithms for DNA sequence assembly. Algorithmica 13, 7–51.
[2] Zerbino, D. R. and Birney, E. (2008). Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome research, 18(5), 821–829.
[3] Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK, Lander ES, Nusbaum C, Jaffe DB. (2008). ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res 18: 810–820.
[4] Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I. (2009). ABySS: a parallel assembler for short read sequence data. Genome Res 19: 1117–1123.
[5] Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, et al. (2012). SOAPdenovo2: an empirically improved memory-efficient shortread de novo assembler. Gigascience 1: 27.
[6] Pevzner P.A. Tang H. Waterman M.S. (2001). An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A., 98:9748–9753.
[7] Paulino D, Warren RL, Vandervalk BP, et al. (2015). Sealer: a scalable gap-closing application for finishing draft genomes. BMC Bioinformatics. 16(1):230.
[8] Boetzer M, Pirovano W. (2012). Toward almost closed genomes with GapFiller. Genome Biol. 13(6):R56.
[9] Tsai, I.J. et al. (2010). Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol., 11, R41.
[10] 10x Genomics, Inc. Overview of Genome Software. https://support.10xgenomics.com/genome-exome/software/overview/welcome. Accessed 11 Jul 2019.
[11] Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. M., & Jaffe, D. B. (2017). Direct determination of diploid genome sequences. Genome Research, 27(5), 757-767.
[12] Yeo S, Coombe L, Chu J, Warren RL, Birol I. (2018). ARCS: scaffolding genome drafts with linked reads. Bioinformatics, 34(5), 725–31.
[13] Coombe L, Zhang J, Vandervalk BP, et al. (2018). ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers. BMC Bioinformatics, 19:234.
[14] Jackman SD, Coombe L, Chu J, Warren RL, Vandervalk BP, Yeo S, et al. (2018). Tigmint: Correcting Assembly Errors Using Linked Reads From Large Molecules bioRxiv.
[15] Warren RL, Yang C, Vandervalk BP, Behsaz B, Lagman A, Jones SJ, et al. (2015). LINKS: scalable, alignment-free scaffolding of draft genomes with long reads. GigaScience, 4(1):35.
[16] Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V., & Zdobnov, E. M. (2017). BUSCO: Assessing genome assembly and annotation completeness with single‐copy orthologs. Bioinformatics, 31(19), 3210–3212.
[17] Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., Lesin, V. M., Nikolenko, S. I., Pham, S., Prjibelski, A. D., Pyshkin, A. V., Sirotkin, A. V., Vyahhi, N., Tesler, G., Alekseyev, M. A., … Pevzner, P. A. (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology : a Journal of Computational Molecular Cell Biology, 19(5), 455-77.
[18] Zheng GX, Lau BT, Schnall-Levin M, Jarosz M, Bell JM, Hindson CM, Kyriazopoulou-Panagiotopoulou S, Masquelier DA, Merrill L, Terry JM, et al. (2016). Haplotyping germline and cancer genomes with highthroughput linked-read sequencing. Nat Biotech 34: 303–311.
[19] Trim Galore!, RRID:SCR_011847, http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
[20] Li, H., & Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England), 26(5), 589-95.
[21] Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, et al. (2011). High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA. 108:1513–1518.
[22] Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W: (2011). Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 27: 578-579.
[23] Koren, S. et al. (2017). Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res.27, 722–736.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top