(3.238.235.155) 您好!臺灣時間:2021/05/16 08:12
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:王筱文
研究生(外文):Wang, Hsiaowen
論文名稱:以次世代定序平台進行比較式連續序列之串接與重組缺口之填補
論文名稱(外文):Comparative Scaffolding and Gap Closure Using Next Generation Sequencing
指導教授:黃耀廷
指導教授(外文):Huang, Yaoting
口試委員:莊樹諄吳邦一張貿翔
口試委員(外文):Chuang, TreesjuenWu, BangyeChang, Mawshang
口試日期:2011-07-29
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:英文
論文頁數:34
中文關鍵詞:基因體重組次世代定序序列橋接
外文關鍵詞:assemblynext generation sequencingscaffold
相關次數:
  • 被引用被引用:0
  • 點閱點閱:285
  • 評分評分:
  • 下載下載:14
  • 收藏至我的研究室書目清單書目收藏:0
次世代定序(Next Generation Sequencing) 技術已經被廣泛使用於重組尚未被研
究物種之基因體。但事實上,因為基因體序列的高複雜度以及次世代定序產生的
核甘酸片段非常短,以致大部分已被重組的基因體仍然很破碎。在本篇論文中,
我們設計並撰寫一個比較式序列橋接與缺口填補之軟體,將其命名為SAGA。
SAGA 小心地根據比較式的分析,修正每個連續序列對上相近物種基因體的邊界
位置,並將許多彼此之間沒有發生結構變異的連續序列串接成較長的橋接序列。
針對每個橋接序列中的重組缺口,SAGA 更使用一種跳躍式的重組方法克服低覆
蓋率或零覆蓋率區域的問題,將缺口中不連續的核甘酸片段重組。我們使用多組
模擬與真實資料測試SAGA 並和其他方法作比較。實驗結果顯示SAGA 可產出
更具連續性的基因體,並且相較於其他方法可得到較高的N50。值得一提的是,
SAGA 可以透過參考相似度只有80%的相近物種基因體來幫助串接重組序列。相
較於基因體光學圖譜技術,SAGA 可花費較低成本來完成基因體重組工作。
Next Generation Sequencing (NGS) technologies have been widely used to assemble
the genomes of unstudied species in the biosphere. In practice, the assembled
genomes are very fragmented due to the complexity of the genome and relatively
short length of reads. In this thesis, we design and implement a comparative Scaffolding
And Gap closure Assembler (called SAGA). By comparatively analyzing
a related genome, SAGA carefully refines the boundary of each contig mapped
on the related genome and links the contigs with no rearrangement events into
scaffolds. For each gap within the scaffold, SAGA further used a jumping assembly
approach to assemble isolated islands of reads in the gap, which overcomes
the limitations of assembling low or no coverage regions. SAGA has been tested
and compared with other methods using a variety of simulated and real data
sets. The experimental results indicated that SAGA significantly produced a
more contiguous genome with larger N50 compared with other programs. It is
worth mentioning that SAGA is able to assist scaffolding the assembly using related
genome with similarity as low as 80%. Compared with physical or optical
map approaches, SAGA is very cost-effective toward the genome finishing.
1 Introduction 1
2 Literature Review 4
2.1 Next Generation Sequencing . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Template Preparation . . . . . . . . . . . . . . . . . . . . 5
2.2 Single Nucleotide Polymorphism . . . . . . . . . . . . . . . . . . . 6
2.3 Introduction to Structural Variation . . . . . . . . . . . . . . . . . 7
3 Material and Method 12
3.1 Determination of Contig Boundaries . . . . . . . . . . . . . . . . 13
3.2 Comparative Scaffolding . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Gap Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.1 Contig Extension via Comparative Read-Sorting and Paired-
End Hits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.2 Jumping Assembly Over Low or No Coverage Area . . . . 19
4 Results and Discussion 21
4.1 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Comparison with Existing Approaches . . . . . . . . . . . . . . . 23
4.2.1 Comparison with Existing Scaffolder . . . . . . . . . . . . 23
4.2.2 Comparison with Existing Comparative Assembler . . . . 24
4.3 Results on Real Data Set . . . . . . . . . . . . . . . . . . . . . . . 25
5 Conclusion and Future Works 27
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
A Supplementary Figures 34
[1] Altschul, S.F., Gish, W., Miller, W., et al. Basic local alignment search tool.
J Mol Biol, 215:403–10, 1990.
[2] Altshuler, D., Brooks, L.D., Chakravarti, A., et al. A haplotype map of the
human genome. Nature, 2005.
[3] Bailey, J.A., Gu, Z., Clark, R.A., et al. Recent Segmental Duplications in
the Human Genome. Science, 297:1003–1007, 2002.
[4] Beckmann, J.S., Estivill, X. and Antonarakis, S.E. Copy number variants
and genetic traits: closer to the resolution of phenotypic to genotypic
variability. Nature Reviews Genetics, 8(8):639–646, 2007. 191ZO Times
Cited:117 Cited References Count:94.
[5] Boetzer, M., Henkel, C.V., Jansen, H.J., et al. Scaffolding pre-assembled
contigs using SSPACE. Bioinformatics/computer Applications in The Bio-
sciences, 27:578–579, 2011.
[6] Britten, R.J. Divergence between samples of chimpanzee and human DNA
sequences is 5%, counting indels. Proceedings of The National Academy of
Sciences, 99:13633–13635, 2002.
[7] Butler, J., MacCallum, I., Kleber, M., et al. ALLPATHS: De novo assembly
of whole-genome shotgun microreads. Genome Research, 2010.
[8] Chang, C.J., Huang, Y.T. and Chao, K.M. A greedier approach for finding
tag SNPs. Bioinformatics/computer Applications in The Biosciences,
22:685–691, 2006.
[9] Dunham, I. Genome Mapping and Sequencing. Horizon Scientific Press,
2003.
[10] Frazer, K.A., Ballinger, D.G., Cox, D.R., et al. A second generation human
haplotype map of over 3.1 million SNPs. Nature, 449:851–861, 2007.
[11] Frazer, K.A., Eskin, E., Kang, H.M., et al. A sequence-based variation map
of 8.27 million SNPs in inbred mouse strains. Nature, 448:1050–1053, 2007.
[12] Helmuth, L. GENOME RESEARCH: Map of the Human Genome 3.0. Sci-
ence, 293, 2001.
[13] Homer, N., Merriman, B. AND Nelson, S.F. BFAST: An Alignment Tool for
Large Scale Genome Resequencing. PLoS ONE, 4(11):e7767, 11 2009.
[14] Huang, Y.T. and Chao, K.M. A new framework for the selection of tag SNPs
by multimarker haplotypes. Journal of Biomedical Informatics, 41:953–961,
2008.
[15] Huang, Y.T., Chao, K.M. and Chen, T. An approximation algorithm for
haplotype inference by maximum parsimony. Journal of Computational Bi-
ology, 12:146–150, 2005.
[16] Huang, Y.T., Zhang, K., Chao, K.M., et al. Selecting additional tag SNPs
for tolerating missing data in genotyping. BMC Bioinformatics, 6, 2005.
[17] International Human Genome Sequencing Consortium. Finishing the euchromatic
sequence of the human genome. Nature, 431:931–945, 2004.
[18] Jeffrey, A.B. and Evan E.E. Primate segmental duplications: crucibles of
evolution, diversity and disease. Nature Reviews Genetics, 7:552–564, 2006.
[19] Li, H., Handsaker, B., Wysoker, A., et al. The Sequence Alignment/Map
format and SAMtools. Bioinformatics/computer Applications in The Bio-
sciences, 25:2078–2079, 2009.
[20] Li, R., Li, Y., Kristiansen, K., et al. SOAP: short oligonucleotide alignment
program. Bioinformatics/computer Applications in The Biosciences, 24:713–
714, 2008.
[21] Mardis, E.R. Next-Generation DNA Sequencing Methods. Annual Review
of Genomics and Human Genetics, 9:387–402, 2008.
[22] Maxam, A.M. and Gilbert, W. A New Method for Sequencing DNA. Pro-
ceedings of The National Academy of Sciences, 74:560–564, 1977.
[23] Ng, P., Tan, J.J., Ooi, H.S., et al. Multiplex sequencing of paired-end ditags
(MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes
and genomes. Nucleic Acids Research, 34, 2006.
[24] Ng, P.C., Strausberg, R.L., Wang, X., et al. Evaluation of next generation
sequencing platforms for population targeted sequencing studies. Genome
Biology, 10, 2009.
[25] Perry, G.H., Amir Ben-Dor, Tsalenko, A., et al. The Fine-Scale and Complex
Architecture of Human Copy-Number Variation. American Journal of
Human Genetics, 82:685–695, 2008.
[26] Pevzner, P.A., Tang, H. and Waterman, M.S. An Eulerian path approach to
DNA fragment assembly. Proceedings of The National Academy of Sciences,
98:9748–9753, 2001.
[27] Pop, M., Phillippy, A., Delcher, A.L. and Salzberg, S.L. Comparative genome
assembly. Brie ngs in Bioinformatics, 5(3):237–248, 2004.
[28] Saiki, R.K., Scharf, S., Faloona, F., et al. Enzymatic amplification of betaglobin
genomic sequences and restriction site analysis for diagnosis of sickle
cell anemia. Science, 230(4732):1350–1354, 1985.
[29] Sanger, F., Nicklen, S. and Coulson, A.R. DNA sequencing with chainterminating
inhibitors. Proceedings of the National Academy of Sciences,
74(12):5463–5467, 1977.
[30] Scherer, S.W., Redon, R., Ishikawa, S., et al. Global variation in copy number
in the human genome. Nature, 444(7118):444–454, 2006. 108BQ Times
Cited:1369 Cited References Count:77.
[31] Schwartz, S., Kent, W.J., Smit, A., et al. HumanVMouse Alignments with
BLASTZ. Genome Research, 13(1):103–107, 2003.
[32] Sebat, J., Lakshmi, B., Troge, J., et al. Large-scale copy number polymorphism
in the human genome. Science, 305(5683):525–528, 2004. 840AH
Times Cited:995 Cited References Count:25.
[33] Sharp, A.J., Carson, A.R. and Scherer, S.W. Structural variation in the
human genome. Annual Review of Genomics and Human Genetics, 7:85–97,
2006.
[34] Simpson, J.T., Wong, K., Jackman, S.D., et al. ABySS: A parallel assembler
for short read sequence data. Genome Research, 19:1117–1123, 2009.
[35] Tuzun, E., Sharp, A.J., Bailey, J.A., et al. Fine-scale structural variation of
the human genome. Nature Genetics, 37:727–732, 2005.
[36] Venter, J.C., Adams, M.D., Myers, E.W., et al. The Sequence of the Human
Genome. Science, 291:1304–1351, 2001.
[37] Voight, B.F., Kudaravalli, S., Wen, X., et al. A Map of Recent Positive
Selection in the Human Genome.
[38] Waterston, R.H., Lindblad-Toh, K., Birney, E., et al. Initial sequencing and
comparative analysis of the mouse genome. Nature, 420(6915):520–62, 2002.
[39] Wheeler, D.A., Srinivasan, M., Egholm, M.,et al. The complete genome of
an individual by massively parallel DNA sequencing. Nature, 452:872–876,
2008.
[40] Zerbino, D.R. and Birney, E. Velvet: Algorithms for de novo short read
assembly using de Bruijn graphs. Genome Research, 18:821–829, 2008.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top