跳到主要內容

臺灣博碩士論文加值系統

(35.172.136.29) 您好!臺灣時間:2021/08/02 04:30
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:王俊傑
研究生(外文):Wang, Chunchieh
論文名稱:以雙端定序進行族群之單體型重組
論文名稱(外文):Haplotype Inference in Diploid Populations Using Paired-End Sequencing
指導教授:黃耀廷
指導教授(外文):Huang, Yaoting
口試委員:張貿翔吳邦一黃耀廷
口試委員(外文):Chang, MaohsiangWu, PangyiHuang, Yaoting
口試日期:2012-07-26
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:30
中文關鍵詞:單體型推論最大期望演算法雙倍基因體
外文關鍵詞:Haplotype inferenceExpectation-maximization algorithmDiploid genome
相關次數:
  • 被引用被引用:0
  • 點閱點閱:253
  • 評分評分:
  • 下載下載:2
  • 收藏至我的研究室書目清單書目收藏:0
給定一組基因型(genotype)資料在雙倍基因體(Diploid Genome)中去推論單一核甘酸多型性(single nucleotide polymorphism; SNP)之單體型(haplotype),在生物資訊領域裡是一項頗有艱鉅的工作。然而,目前有次世代定序平台技術可以產生雙端定序之短序列,其序列資訊可以部分重組出兩條單體型序列。對於單體型推論的問題來說,目前在文獻上有統計與組合最佳這兩大類方法,但在這些方法中皆未加入雙端定序的資訊讓推論單體型的結果更加準確。在此篇論文我們提出並且實作出新的架構,我們將其命名為PE-EM。主要是想利用雙端定序的資訊當作前提假設,結合最大期望演算法找出單體型頻率在母體中之最大近似估計。各種模擬實驗結果顯示PE-EM的正確性都較傳統的EM的方法來的好。利用雙端定序的資訊所找出來的單體型可以幫助與疾病相關之研究,並且可以發展個人化的治療方針。
Haplotype inference is a challenging problem in bioinformatics that consists in inferring the single nucleotide polymorphisms (SNPs) of diploid organisms on the basis of their genotype. However, existing paired-end read contributed by next-generation sequencing (NGS) can be used to infer the paternal and maternal haplotypes. Several statistical and combinatorial approaches have been developed in literature. But, few of them incorporate with paired-end sequencing data to recognize a true two haplotypes on each individual in a population. In this thesis, we propose and implement a new framework called PE-EM that is based on expectation-maximization (EM) algorithm for finding maximum likelihood estimation of haplotype frequencies while considering the linkage between SNPs as priori using pair-end reads. The experimental results indicated that the accuracy of PE-EM is higher than the traditional EM algorithm. With this haplotype information, researchers can perform association studies for the genetic variants involved in diseases and the individual responses to therapeutic agents.
Introduction 4
Literature Review 6
2.1 Introduction to Next-Generation Sequencing 6
2.2 Introduction to Single Nucleotide Polymorphism 7
2.3 The haplotype inference problem (HIP) 8
2.4. Hardy-Weinberg equilibrium (HWE) 8
2.5. Expectation-Maximization Algorithm for HIP 8
Material and Method 10
3.1 Notation Definition 11
3.2 Extraction of Direct and Transitive Linkage between SNPs by Paired-end Reads 12
3.3 EM Algorithm Incorporated with Paired-end Links for Haplotype Frequencies Estimation 13
3.3.1 Expectation Step in PE-EM 14
3.3.2 Maximization Step in PE-EM 16
3.3.3 PE-EM Algorithm Convergence 16
3.3.4 Pareto Distribution 17
Results and Discussion 18
4.1 Accuracy Definition 18
4.2 The Experiment Results of Simulated Data 19
4.3 Discussion of PE-EM 22
Conclusion and Future Work 23
5.1 Summary 23
5.2 Future Works 24
Reference 25

1.Mardis, E.R., The impact of next-generation sequencing technology on genetics. Trends in Genetics, 2008. 24(3): p. 133-141.
2.Dalca, A.V. and M. Brudno, Genome variation discovery with high-throughput sequencing data. Brief Bioinform, 2010. 11(1): p. 3-14.
3.Shendure, J., et al., Advanced sequencing technologies: methods and goals. Nat Rev Genet, 2004. 5(5): p. 335-44.
4.Voelkerding, K.V., S.A. Dames, and J.D. Durtschi, Next-generation sequencing: from basic research to diagnostics. Clin Chem, 2009. 55(4): p. 641-58.
5.Haussler, D., et al., Genome 10K: A Proposal to Obtain Whole-Genome Sequence for 10 000 Vertebrate Species. Journal of Heredity, 2009. 100(6): p. 659-674.
6.Terwilliger, J.D. and K.M. Weiss, Linkage disequilibrium mapping of complex disease: fantasy or reality? Current Opinion in Biotechnology, 1998. 9(6): p. 578-594.
7.Hoehe, M.R., et al., Sequence variability and candidate gene analysis in complex disease: association of mu opioid receptor gene variation with substance dependence. Human Molecular Genetics, 2000. 9(19): p. 2895-2908.
8.Stephens, J.C., et al., Haplotype variation and linkage disequilibrium in 313 human genes. Science, 2001. 293(5529): p. 489-93.
9.He, D., et al., Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics, 2010. 26(12): p. i183-90.
10.Wang, R.S., et al., Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics, 2005. 21(10): p. 2456-62.
11.The International HapMap Project. Nature, 2003. 426(6968): p. 789-96.
12.A haplotype map of the human genome. Nature, 2005. 437(7063): p. 1299-320.
13.A map of human genome variation from population-scale sequencing. Nature, 2010. 467(7319): p. 1061-73.
14.Halldorsson, B.V., et al., A survey of computational methods for determining haplotypes. Computational Methods for Snps and Haplotype Inference, 2004. 2983: p. 26-47.
15.Gusfield, D., Haplotype inference by pure parsimony. Combinatorial Pattern Matching, Proceedings, 2003. 2676: p. 144-155.
16.Huang, Y.T., K.M. Chao, and T. Chen, An approximation algorithm for haplotype inference by maximum parsimony. Journal of Computational Biology, 2005. 12(10): p. 1261-1274.
17.Excoffier, L. and M. Slatkin, Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol, 1995. 12(5): p. 921-7.
18.Zhang, J., M. Vingron, and M.R. Hoehe, Haplotype reconstruction for diploid populations. Hum Hered, 2005. 59(3): p. 144-56.
19.Browning, S.R. and B.L. Browning, Haplotype phasing: existing methods and new developments. Nat Rev Genet, 2011. 12(10): p. 703-14.
20.Bentley, D.R., Whole-genome re-sequencing. Curr Opin Genet Dev, 2006. 16(6): p. 545-52.
21.Bansal, V. and V. Bafna, HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 2008. 24(16): p. i153-9.
22.Metzker, M.L., Applications of Next-Generation Sequencing Sequencing Technologies - the Next Generation. Nature Reviews Genetics, 2010. 11(1): p. 31-46.
23.Wold, B. and R.M. Myers, Sequence census methods for functional genomics. Nature Methods, 2008. 5(1): p. 19-21.
24.Yang, M.Q., et al., High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics. BMC Genomics, 2009. 10 Suppl 1: p. I1.
25.Pop, M. and S.L. Salzberg, Bioinformatics challenges of new sequencing technology. Trends in Genetics, 2008. 24(3): p. 142-9.
26.Chen, Y.X., X. Li, and J. Li, A novel approach for haplotype-based association analysis using family data. Bmc Bioinformatics, 2010. 11.
27.Hussin, J., et al., Haplotype allelic classes for detecting ongoing positive selection. Bmc Bioinformatics, 2010. 11: p. 65.
28.Liu, J., et al., SNPs and haplotypes in the S100B gene reveal association with schizophrenia. Biochem Biophys Res Commun, 2005. 328(1): p. 335-41.
29.Nakashima, K., et al., A functional polymorphism in MMP-9 is associated with childhood atopic asthma. Biochemical and Biophysical Research Communications, 2006. 344(1): p. 300-307.
30.Voight, B.F., et al., A map of recent positive selection in the human genome. PLoS Biol, 2006. 4(3): p. e72.
31.Huang, Y.T. and K.M. Chao, A new framework for the selection of tag SNPs by multimarker haplotypes. Journal of Biomedical Informatics, 2008. 41(6): p. 953-961.
32.Huang, Y.T., et al., Selecting additional tag SNPs for tolerating missing data in genotyping. Bmc Bioinformatics, 2005. 6: p. 263.
33.Chang, C.J., Y.T. Huang, and K.M. Chao, A greedier approach for finding tag SNPs. Bioinformatics, 2006. 22(6): p. 685-91.
34.Do, C.B. and S. Batzoglou, What is the expectation maximization algorithm? Nature Biotechnology, 2008. 26(8): p. 897-899.
35.Carvajal-Rodriguez, A., GENOMEPOP: a program to simulate genomes in populations. Bmc Bioinformatics, 2008. 9: p. 223.
36.Li, H., et al., The Sequence Alignment/Map format and SAMtools. Bioinformatics, 2009. 25(16): p. 2078-9.
37.Li, H. and R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009. 25(14): p. 1754-60.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top