跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.134) 您好!臺灣時間:2025/12/20 01:43
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:莊盛宇
研究生(外文):Chuang, Shengyu
論文名稱:以次世代定序平台同時進行單體型之重組與結構性變異之偵測
論文名稱(外文):Simultaneous Haplotype Assembly and Structural Variations Detection Using Next Generation Sequencing
指導教授:黃耀廷
指導教授(外文):Huang, Yaoting
口試委員:劉俊吉丁川康陳永恩黃耀廷
口試委員(外文):Liu, ChunchiTing, ChuankangChan, MichaelHuang, Yaoting
口試日期:2011-09-22
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2011
畢業學年度:100
語文別:英文
論文頁數:44
中文關鍵詞:單體形重組基因演算法雙倍基因體
外文關鍵詞:Haplotype assemblyGenetic algorithmDiploid genome
相關次數:
  • 被引用被引用:0
  • 點閱點閱:712
  • 評分評分:
  • 下載下載:14
  • 收藏至我的研究室書目清單書目收藏:0
在生物圈中大部份的物種都是由一對單體型(Haplotype)所組成的雙倍基因體(Diploid Genome),然而目前適用於次世代定序平台的重組軟體都只能重建出一條序列,且此序列是同時包含兩條單體型資訊的馬賽克結構。此外,兩條單體型之間的序列差異包含單一核甘酸多型性(Single Nucleotide Polymorphism; SNP),與大規模的結構性變異(Structural Variation; SV)。因此,要使用次世代平台重建一個雙倍基因體的兩條單體型序列,至今仍是個艱鉅的任務。在此篇論文中我們設計並且實作出一個新的架構,可以利用雙端定序短序列重組出雙倍基因體的兩條單體型序列,我們將其命名為HapSVAssembler。HapSVAssembler首先結合多種重組演算法先重建出一條參考序列稱為參考基因體。透過雙端序列與參考基因體之序列比對,進一步找出異合型單一核甘酸多型性與異合型結構性變異之座標位置。最後分析跨越兩個以上之異合型單一核甘酸多型性或異合型結構性變異的雙端序列,以分離重建出兩條完整的單體型序列。在單體型重組過程中,我們定義出一個新的最佳化問題,並設計基因演算法(Genetic Algorithm; GA)來解決。各種模擬實驗結果顯示HapSVAssembler重組的正確性和完整度都較之前的方法來的好。此外,HapSVAssembler將可協助分析不同遺傳變異間的連鎖不平衡(Linkage Disequilibrium)現象。
The genomes of most species in the biosphere is a diploid genome composed of two haplotypes. However, existing short-read assemblers for next-generation sequencing (NGS) platforms only reconstruct one consensus sequence which is a mosaic of the two haplotypes. In addition, the differences between the two haplotypes range from Single Nucleotide Polymorphisms (SNPs) to large-scale structure variations (SVs). Therefore, de novo haplotype assembly of a diploid genome is a still challenging task using NGS platforms. In this thesis, we design and implement a new framework called HapSVAssembler for de novo assembly of a diploid genome using short paired-end reads. HapSVAssembler uses a hybrid assembly approach to build a consensus sequence, identify heterozygous SNPs and SV loci, and simultaneously reconstruct the SNP/SV haplotypes via reads spanning two or more SNPs/SVs. A new optimization problem is formulated and solved by Genetic Algorithm (GA). The experimental results indicated that the assembly accuracies and continuity of HapSVAssembler is much higher than previous methods. With the ability of assembling haplotypes containing multiple types of genomic variations, HapSVAssembler is very useful for studying linkage disequilibrium across different variations.
1 Introduction 1
2 Literature Review 4
2.1 Introduction to Next-Generation Sequencing . . . . . . . . . . . . . . . . . 4
2.2 Introduction to Single Nucleotide Polymorphism . . . . . . . . . . . . . . . 5
2.3 Minimum Error Correction Model . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Introduction to Structure Variations . . . . . . . . . . . . . . . . . . . . . 7
3 Material and Method 10
3.1 de novo Assembly Using Hybrid Approach . . . . . . . . . . . . . . . . . . 11
3.2 Heterozygous SNP/SV Detection . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 SV Detection by Discordant Reads . . . . . . . . . . . . . . . . . . 14
3.2.2 SV Boundary Refinement by Breakpoint Reads . . . . . . . . . . . 17
3.3 SNP/SV Matrix Construction and Haplotype Blocks Partition . . . . . . . 19
3.3.1 SNP/SV Matrix Construction . . . . . . . . . . . . . . . . . . . . . 19
3.3.2 Haplotype Blocks Partition . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Haplotype Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.1 Constrained MEC Formulation . . . . . . . . . . . . . . . . . . . . 22
3.4.2 GA Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 Results and Discussion 27
4.1 Reconstructed Accuracy Definition . . . . . . . . . . . . . . . . . . . . . . 27
4.2 The Experiment Results of Simulated Data . . . . . . . . . . . . . . . . . . 28
4.3 The Result of Chlorella Sorokiniana Genome . . . . . . . . . . . . . . . . 32
4.4 Discussion of GA in CMEC . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Conclusion and Future Work 36
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A Supplementary Figures 44
[1] Ahn, S.M., Kim, T.H., Lee, S., et al. The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group. Genome Research, 19:1622–1629, 2009.
[2] Alkan, C., Sajjadian, S. and Eichler, E.E. Limitations of next-generation genome sequence assembly. Nature Methods, 1:61–65, 2011.
[3] Bansal, V., and Bafna, V. HapCUT: an efficient and accurate algorithm for the
haplotype assembly problem. Bioinformatics, 24:i153–i159, 2008.
[4] Bentley, D. Whole-genome re-sequencing. Current Opinion in Genetics & Develop-
ment, 16:545–552, 2006.
[5] Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., and Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics, 27:578–579, 2011.
[6] Chaisson, M.J., Brinza, D. and Pevzner, P.A. De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research, 19:336–346,
2008.
[7] Chakravarti, A. It’s raining SNPs, hallelujah? Nature Genetics, 19:216–217, 1998.
[8] Chang, C.J., Huang, Y.T. and Chao, K.M. A greedier approach for finding tag SNPs. Bioinformatics, 22:685–691, 2006.
[9] Chen, K., Wallis, J.W., McLellan, M.D., and et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods, 6:677–681, 2009.
[10] Chen, Y., Xin, L., and Li, J. A novel approach for haplotype-based association analysis using family data. BMC Bioinformatics, 11, 2010.
[11] Cilibrasi, R., Iersel, L.V., Kelk, S., and Tromp, J. On the Complexity of Several Haplotyping Problems. In Algorithms in Bioinformatics, pages 128–139, 2005.
[12] Frazer, K.A., Ballinger, D.G., Cox, D.R., et al. A second generation human haplotype map of over 3.1 million SNPs. Nature, 449:851–861, 2007.
[13] Genome 10K Community of Scientists. Genome 10K: a proposal to obtain wholegenome sequence for 10,000 vertebrate species. Journal of Heredity, 100:659–674, 2009.
[14] Gilbert, W. DNA sequencing and gene structure. Science, 214:1305–1312, 1981.
[15] Harismendy, O., Ng, P.C., Strausberg, R.L., Wang, X., Stockwell, T.B., Beeson,K.Y., Schork, N.J., Murray, S.S., Topol, E.J., Levy, S., and Frazer, K.A. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biology, 10, 2009.
[16] He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., and Eskin, E. Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics, 26:183–190, 2010.
[17] Hoehe, M.R., Köpke, K., Wendel, B., et al. Sequence variability and candidate gene analysis in complex disease: association of micro opioid receptor gene variation with substance dependence. Human Molecular Genetics, 9:2895–2908, 2000.
[18] Hormozdiari, F., Alkan, C., Eichler, E.E., and Sahinalp, S.C. Combinatorial Algorithms for Structural Variation Detection in High-Throughput Sequenced Genomes. Genome Research, 19:1270–1278, 2009.
[19] Hormozdiari, F., Hajirasouliha, I., Dao, P., Hach, F., Yörükoglu, D., Alkan, C., Eichler, E.E., and Sahinalp, S.C. Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery. Bioinformatics, 26:350–357, 2010.
[20] Huang, Y.T. and Chao, C.M. A new framework for the selection of tag SNPs by
multimarker haplotypes. Journal of Biomedical Informatics, 41:953–961, 2008.
[21] Huang, Y.T., Zhang, K., Chen, T. and Chao, K.M. Selecting additional tag SNPs for tolerating missing data in genotyping. BMC Bioinformatics, 6, 2005.
[22] Hussin, J., Nadeau, P., Lefebvre, J.F, and Labuda, D. Haplotype allelic classes for detecting ongoing positive selection. BMC Bioinformatics, 11, 2010.
[23] International Human Genome Sequencing Consortium. Finishing the euchromatic
sequence of the human genome. Nature, 431:931–945, 2004.
[24] Lancia, G., Bafna, V., Istrail, S., Lippert, R. and Schwartz, R. SNPs Problems, Complexity, and Algorithms. In European Symposium on Algorithms, pages 182–193, 2001.
[25] Lee, S., Hormozdiari, F., Alkan, C. and Brudno, M. MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions. Nature Methods, 6:473–474, 2009.
[26] Levy, S., Sutton, G., Ng, P.C. et al. The Diploid Genome Sequence of an Individual Human. Plos Biology, 5, 2007.
[27] Li, H. and Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25:1754–1760, 2009.
[28] Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G.R. and Durbin, R. The Sequence Alignment/Map format and SAMtools.
Bioinformatics, 25:2078–2079, 2009.
[29] Li, R., Fan, W., Tian, G., et al. The sequence and de novo assembly of the giant panda genome. Nature, 463:311–317, 2009.
[30] Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., Yang, H., and Wang, J. De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20:265–272, 2010.
[31] Lippert, R., Schwartz, R., Lancia, G., and Istrail, S. Algorithmic strategies for the SNPs haplotype assembly problem. Brie ngs in Bioinformatics, 3:23–31, 2002.
[32] Liu, Y., Shi, Y., Tang, J., et al. SNPs and haplotypes in the S100B gene reveal association with schizophrenia. Biochemical and Biophysical Research Communications, 328:335–341, 2005.
[33] Mardis, E.R. Next-Generation DNA Sequencing Methods. Annual Review of Ge-
nomics and Human Genetics, 9:387–402, 2008.
[34] Maxam, A.M. and Gilbert, W. A New Method for Sequencing DNA. Proceedings of
The National Academy of Sciences, 74:560–564, 1977.
[35] McKernan, K.J., Peckham, H.E., Costa, G.L., et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Research, 19:1527–1541, 2009.
[36] Nakashima, K., Hirota, T., Obara, K., et al. A functional polymorphism in MMP-9 is associated with childhood atopic asthma. Biochemical and Biophysical Research Communications, 344:300–307, 2006.
[37] Perry, G.H, Amir, B.D., Tsalenko, A., et al. The Fine-Scale and Complex Architecture of Human Copy-Number Variation. American Journal of Human Genetics, 82:685–695, 2008.
[38] Pevzner, P.A., Tang, h. and Waterman, M.S. An Eulerian path approach to DNA
fragment assembly. Proceedings of The National Academy of Sciences, 98:9748–9753,
2001.
[39] Redon, R., Ishikawa, S., Fitch, K.R., et al. Global variation in copy number in the human genome. Nature, 444:444–454, 2006.
[40] Sebat, J., Lakshmi, B., and Troge, J. Large-Scale Copy Number Polymorphism in the Human Genome. Science, 305:525–528, 2004.
[41] Sharp, A.J., Carson, A.R. and Scherer, S.W. Structural variation in the human
genome. Annual Review of Genomics and Human Genetics, 7:85–97, 2006.
[42] Shulaev, V., Sargent D.J., Crowhurst R.N., et al. The genome of woodland strawberry (Fragaria vesca). Nature Genetics, 43:109–116, 2010.
[43] Simpson, J.T., Wong, K., Jackman, S.D., et al. ABySS: A parallel assembler for short read sequence data. Genome Research, 19:1117–1123, 2009.
[44] Sindi, S., Helman, E., Bashir, A., and Raphael, B.J. A geometric approach for classification and comparison of structural variants. Bioinformatics, 25, 2009.
[45] Sommer, D.D., Delcher, A.L., Salzberg, S.L., and Pop, M. Minimus: a fast, lightweight genome assembler. BMC Bioinformatics, 8, 2007.
[46] Stefansson, H., Helgason, A., Thorleifsson, G., et al. A common inversion under selection in Europeans. Nature Genetics, 37:129–137, 2005.
[47] Stephens, J.C, Schneider, J.A., Tanguay, D.A., et al. Haplotype Variation and Linkage Disequilibrium in 313 Human Genes. Science, 293:489–493, 2001.
[48] Terwilliger, J.D., andWeiss, K.M. Linkage disequilibrium mapping of complex disease: fantasy or reality? Current Opinion in Biotechnology, 9:578–594, 1998.
[49] Voight, B.F., Kudaravalli, S., Wen, X. and Pritchard, J.K. A Map of Recent Positive Selection in the Human Genome. Plos Biology, 4, 2006.
[50] Wang, J., Wang, W., Li, R., et al. The diploid genome sequence of an Asian individual. Nature, 456:60–65, 2008.
[51] Wang, R.S., Wu, L.Y., Li, Z.P., and Zhang, X.S. Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics, 21:2456–2462, 2005.
[52] Wendl, M.C. and Wilson, R.K. Statistical aspects of discerning indel-type structural variation via DNA sequence alignment. BMC Genomics, 10, 2009.
[53] Wheeler, D.A., Srinivasan, M., Egholm, M., et al. The complete genome of an individual by massively parallel DNA sequencing. Nature, 452:872–876, 2008.
[54] Zerbino, D.R. and Birney, E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18:821–829, 2008.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top