研究生(外文):Wang, Hsiaowen
論文名稱(外文):Comparative Scaffolding and Gap Closure Using Next Generation Sequencing
指導教授(外文):Huang, Yaoting
口試委員(外文):Chuang, TreesjuenWu, BangyeChang, Mawshang
外文關鍵詞:assemblynext generation sequencingscaffold
次世代定序(Next Generation Sequencing) 技術已經被廣泛使用於重組尚未被研
SAGA 小心地根據比較式的分析,修正每個連續序列對上相近物種基因體的邊界
針對每個橋接序列中的重組缺口,SAGA 更使用一種跳躍式的重組方法克服低覆
模擬與真實資料測試SAGA 並和其他方法作比較。實驗結果顯示SAGA 可產出
SAGA 可以透過參考相似度只有80%的相近物種基因體來幫助串接重組序列。相
較於基因體光學圖譜技術,SAGA 可花費較低成本來完成基因體重組工作。
Next Generation Sequencing (NGS) technologies have been widely used to assemble
the genomes of unstudied species in the biosphere. In practice, the assembled
genomes are very fragmented due to the complexity of the genome and relatively
short length of reads. In this thesis, we design and implement a comparative Scaffolding
And Gap closure Assembler (called SAGA). By comparatively analyzing
a related genome, SAGA carefully refines the boundary of each contig mapped
on the related genome and links the contigs with no rearrangement events into
scaffolds. For each gap within the scaffold, SAGA further used a jumping assembly
approach to assemble isolated islands of reads in the gap, which overcomes
the limitations of assembling low or no coverage regions. SAGA has been tested
and compared with other methods using a variety of simulated and real data
sets. The experimental results indicated that SAGA significantly produced a
more contiguous genome with larger N50 compared with other programs. It is
worth mentioning that SAGA is able to assist scaffolding the assembly using related
genome with similarity as low as 80%. Compared with physical or optical
map approaches, SAGA is very cost-effective toward the genome finishing.
1 Introduction 1
2 Literature Review 4
2.1 Next Generation Sequencing . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Template Preparation . . . . . . . . . . . . . . . . . . . . 5
2.2 Single Nucleotide Polymorphism . . . . . . . . . . . . . . . . . . . 6
2.3 Introduction to Structural Variation . . . . . . . . . . . . . . . . . 7
3 Material and Method 12
3.1 Determination of Contig Boundaries . . . . . . . . . . . . . . . . 13
3.2 Comparative Scaffolding . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Gap Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.1 Contig Extension via Comparative Read-Sorting and Paired-
End Hits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.2 Jumping Assembly Over Low or No Coverage Area . . . . 19
4 Results and Discussion 21
4.1 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Comparison with Existing Approaches . . . . . . . . . . . . . . . 23
4.2.1 Comparison with Existing Scaffolder . . . . . . . . . . . . 23
4.2.2 Comparison with Existing Comparative Assembler . . . . 24
4.3 Results on Real Data Set . . . . . . . . . . . . . . . . . . . . . . . 25
5 Conclusion and Future Works 27
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
A Supplementary Figures 34
