跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.197) 您好!臺灣時間:2026/04/17 14:48
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳芃安
研究生(外文):Peng-An Chen
論文名稱:DNA拷貝數在人類基因體中的分析
論文名稱(外文):DNA Copy Number Data Analysis in Human Genomes
指導教授:趙坤茂
指導教授(外文):Kun-Mao Chao
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:英文
論文頁數:44
中文關鍵詞:拷貝數變異微陣列基因體比較雜合法急性骨髓性白血病
外文關鍵詞:Copy Number VariationArray Comparative Genomic HybridizationAcute Myeloid Leukemia
相關次數:
  • 被引用被引用:1
  • 點閱點閱:829
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
拷貝數變異 (Copy Number Variation, CNV) 是一種人類基因體中的結構變異,並且已知與多種遺傳疾病相關。微陣列基因體比較雜合法 (Array Comparative Genomic Hybridization, Array CGH) 可以提供生物學家高解析度的DNA 拷貝數分析,由於微陣列基因體比較雜合法的解析度不斷上升,其信雜比也逐漸提高。為了處理這些雜訊,微陣列基因體比較雜合法之實驗數據將再被分析,以找出人類基因體中拷貝數變異的區段。Lipson 等人提出一個統計上的架構去精確地找出人類基因體中拷貝數變異的邊界並提供各區段之顯著性指標。在這個架構中,微陣列基因體比較雜合法之雜訊被假設為常態分佈。然而目前尚未有研究支持此假設。同時,許多統計的方法也是基於同樣的假設。在本論文中,我們不對雜訊的分佈作假設,而提出一個改進的架構,並同時發展一套系統性的方法來選擇此架構中的參數。在我們的架構之下,我們使用一個由Bernholt 等人所提出的演算法來找出人類基因體中的拷貝數變異區段。然而,Bernholt 等人所提出的演算法無法分別地找出DNA 片段之複製與刪除事件。因此,我們也提出一個線性時間演算法來分別地找出DNA 片段之複製與刪除事件。我們使用此改進的架構去分析一個急性骨髓性白血病的微陣列基因體比較雜合法之實驗數據。我們的方法可以更精準地找出拷貝數變異的位置並找到許多包含與急性骨髓性白血病相關之基因的DNA 區段。
Copy number variations (CNVs) are one kind of structural variations in the human genome and are associated with many genetic diseases. Array CGH approaches can provide biologists high resolution analysis of DNA copy number data. Since the resolution of array CGH approaches is increasing, the signal-to-noise ratio is also getting higher in recent array CGH approaches. To handle the noise in the array CGH approaches, the experimental results are further analyzed to locate the copy number variations in the human genome. Lipson et al. [Journal of Computational Biology, 13(2):215-228, 2006] propose a statistical framework which enables us to find the boundaries of copy number variations in the human genome accurately and provides the significance for each aberration calling. It is assumed that the noise in the array CGH data is normally distributed in the framework. However, there is no evidence supporting this assumption. Furthermore, many statistical approaches also suffer this problem. In this thesis, we propose an improved framework without making the assumption. We also develop a systematic method for selecting the parameters in our framework. A linear time algorithm proposed by Bernholt et al. [7th Latin Americal Symposium, pages 178-189, 2006] is used to find copy number variations under this framework. However, their algorithm cannot find duplication events and deletion events of the human genome separately. Thus, a linear time algorithm for this purpose is proposed. We demonstrate the power of our methods by applying them to an array CGH dataset from leukemia patients. Our methods locate the CNVs in the array CGH data more accurately and finds regions which contain genes related to the acute myeloid leukemia.
1 Introduction 1
1.1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Problem Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 The Aberrant Segments Finding Problem 9
2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Lower Bound Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Analysis of DNA Copy Number Data among Multiple Samples . . . . . . . . . . 22
3 The Amplification Segments Finding Problem 30
3.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4 Concluding Remarks 35
4.1 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Bibliography 37
[1] C. D. Baldus, S. Liyanarachchi, K. Mr′ozek, H. Auer, S. M. Tanner, M. Guimond, A. S.
Ruppert, N. Mohamed, R. V. Davuluri, M. A. Caligiuri, et al. Acute myeloid leukemia
with complex karyotypes and abnormal chromosome 21: amplification discloses overexpression
of APP, ETS2, and ERG genes. Proceedings of the National Academy of Sciences,
101:3915–3920, 2004.
[2] M. T. Barrett, A. Scheffer, A. Ben-Dor, N. Sampas, D. Lipson, R. Kincaid, P. Tsang,
B. Curry, K. Baird, P. S. Meltzer, Z. Yakhini, L. Bruhn, and S. Laderman. Comparative
genomic hybridization using oligonucleotide microarrays and total genomic DNA.
Proceedings of the National Academy of Sciences, 101:17765–17770, 12 2004.
[3] A. Ben-Dor, D. Lipson, A. Tsalenko, M. Reimers, L. O. Baumbusch, M. T. Barrett, J. N.
Weinstein, A.-L. Børresen-Dale, and Z. Yakhini. Framework for identifying common aberrations
in DNA copy number data. In Proceedings of the 11th Annual International Conference
on Research in Computational Molecular Biology. Lecture Notes in Computer Science,
volume 4453, pages 122–136, 2007.
[4] J. M. Bennett, D. Catovsky, M.-T. Daniel, G. Flandrin, D. A. G. Galton, H. R. Gralnick,
and C. Sultan. Proposals for the classification of the acute leukaemias French-American-
British (FAB) Co-operative Group. British Journal of Haematology, 33:451–458, 1976.
[5] T. Bernholt, F. Eisenbrand, and T. Hofmeister. A geometric framework for solving subsequence
problems in computational biology efficiently. In Proceedings of the 23rd Annual
Symposium on Computational Geometry, pages 310–318, 2007.
37
[6] T. Bernholt and T. Hofmeister. An algorithm for a generalized maximum subsequence
problem. In 7th Latin American Symposium, pages 178–189, 2006.
[7] N. P. Carter. Methods and strategies for analyzing copy number variation using DNA
microarrays. Nature Genetics, 39:S16–S21, 2007.
[8] T. M. Chan. More algorithms for all-pairs shortest paths in weighted graphs. In Annual
ACM Symposium on Theory of Computing, pages 590–598, 2007.
[9] G. A. Churchill. Fundamentals of experimental design for cDNA microarrays. Nature
Genetics, 32:490–495, 2002.
[10] A. C. Davison and D. V. Hinkley. Bootstrap Methods and Their Application. Cambridge
University Press, 1997.
[11] S. J. Diskin, T. Eck, J. Greshock, Y. P. Mosse, T. Naylor, J. Christian J. Stoeckert, B. L.
Weber, J. M. Maris, and G. R. Grant. STAC: A method for testing the significance of
DNA copy number aberrations across multiple array-CGH experiments. Genome Research,
16:1149–1158, 2006.
[12] B. Efron and R. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall/CRC,
1994.
[13] P. H. C. Eilers and R. X. de Menezes. Quantile smoothing of array CGH data. Bioinformatics,
21:1146–1153, 2005.
[14] W. El-Rifai, E. Elonen, M. Larramendy, T. Ruutu, and S. Knuutila. Chromosomal breakpoints
and changes in DNA copy number in refractory acute myeloid leukemia. Leukemia,
11:958–963, 1997.
[15] T.-H. Fan, S. Lee, H.-I. Lu, T.-S. Tsou, T.-C. Wang, and A. Yao. An optimal algorithm
for maximum-sum segment and its application in bioinformatics. In 8th International
Conference on Implementation and Application of Automata, pages 251–257, 2003.
[16] L. Feuk, A. R. Carson, and S. W. Scherer. Structural variation in the human genome.
Nature Reviews Genetics, 7:85–97, 2006.
[17] J. L. Freeman, G. H. Perry, L. Feuk, R. Redon, S. A. McCarroll, D. M. Altshuler, H. Aburatani,
K.W. Jones, C. Tyler-Smith, M. E. Hurles, N. P. Carter, S.W. Scherer, and C. Lee.
Copy number variation: new insights in genome diversity. Genome Research, 16:949–961,
2006.
[18] J. Fridlyand, A. M. Snijders, D. Pinkel, D. G. Albertson, and A. N. Jain. Hidden markov
models approach to the analysis of array CGH data. Journal of Multivariate Analysis,
90:132–153, 2004.
[19] A. Hamosh, A. F. Scott, J. Amberger, C. Bocchini, D. Valle, and V. A. McKusick. Online
Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic
disorders. Nucleic Acids Research, 30:52–55, 2002.
[20] R. V. Hogg and E. A. Tanis. Probability and Statistical Inference. Pearson Prentice Hall,
7th edition, 2006.
[21] L. Hsu, S. G. Self, D. Grove, T. Randolph, K. Wang, J. J. Delrow, L. Loo, and P. Porter.
Denoising array-based comparative genomic hybridization data using wavelets. Biostatistics,
6(2):211–226, 2005.
[22] Y.-T. Huang. A Study on Some Optimization Problems Related to SNPs and Haplotypes.
PhD thesis, National Taiwan University, 2006.
[23] P. Hup′e, N. Stransky, J.-P. Thiery, F. Radvanyi, and E. Barillot. Analysis of array CGH
data: from signal ratio to gain and loss of DNA regions. Bioinformatics, 20(18):3413–3422,
2004.
[24] A. J. Iafrate, L. Feuk, M. N. Rivera, M. L. Listewnik, P. K. Donahoe, Y. Qi, S. W.
Scherer, and C. Lee. Detection of large-scale variation in the human genome. Nature
Genetics, 36:949–951, 2004.
[25] K. Inoue and J. R. Lupski. Molecular mechanisms for genomic disorders. Annual Review
of Genomics and Human Genetics, 3:199–242, 2002.
[26] K. Jong, E. Marchiori, A. van der Vaart, B. Ylstra, M. Weiss, and G. Meijer. Chromosomal
breakpoint detection in human cancer. In Applications of Evolutionary Computing, Lecture
Notes in Computer Science, volume 2611, pages 107–116, 2003.
[27] A. Kallioniemi, O.-P. Kallioniemi, D. Sudar, D. Rutovitz, J. W. Gray, F. Waldman, and
D. Pinkel. Comparative genomic hybridization for molecular cytogenetic analysis of solid
tumors. Science, 258:818–821, 1992.
[28] D. Komura, F. Shen, S. Ishikawa, K. R. Fitch, W. Chen, J. Zhang, G. Liu, S. Ihara,
H. Nakamura, M. E. Hurles, et al. Genome-wide detection of human copy number variations
using high-density DNA oligonucleotide arrays. Genome Research, 16:1575–1584,
2006.
[29] W. R. Lai, M. D. Johnson, R. Kucherlapati, and P. J. Park. Comparative analysis of
algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics,
21(19):3763–3770, 2005.
[30] Z.-Y. Li, D.-P. Liu, and C.-C. Liang. New insight into the molecular mechanisms of MLLassociated
leukemia. Leukemia, 19:183–190, 2005.
[31] H. W. Lilliefors. On the Kolmogorov-Smirnov test for normality with mean and variance
unknown. Journal of the American Statistical Association, 62(318):399–402, 1967.
[32] Y.-L. Lin, T. Jiang, and K.-M. Chao. Efficient algorithms for locating the lengthconstrained
heaviest segments with applications to biomolecular sequence analysis. Journal
of Computer and System Sciences, 65(3):570–586, 2002.
[33] O. C. Lingjærde, L. O. Baumbusch, K. Liestøl, I. K. Glad, and A.-L. Børresen-Dale.
CGH-Explorer: a program for analysis of array-CGH data. Bioinformatics, 21:821–822,
2005.
[34] D. Lipson, Y. Aumann, A. Ben-Dor, N. Linial, and Z. Yakhini. Efficient calculation of
interval scores for DNA copy number data analysis. Journal of Computational Biology,
13(2):215–228, 2006.
[35] H.-F. Liu. Constrained Searching and Ordering Problems on Sequences. PhD thesis, National
Taiwan University, 2008.
[36] H.-F. Liu, P.-A. Chen, and K.-M. Chao. Algorithms for computing the length-constrained
max-score segments with applications to DNA copy number data analysis. In Proceedings
of the 18th International Symposium on Algorithms and Computation. Lecture Notes in
Computer Science, volume 4835, pages 834–845, 2007.
[37] R. Lucito, J. Healy, J. Alexander, A. Reiner, D. Esposito, M. Chi, L. Rodgers, A. Brady,
J. Sebat, J. Troge, et al. Representational oligonucleotide microarray analysis: A highresolution
method to detect genome copy number variation. Genome Research, 13:2291–
2305, 2003.
[38] S. A. McCarroll and D. M. Altshuler. Copy-number variation and association studies of
human disease. Nature Genetics, 39:S37–S42, 2007.
[39] D. S. Moore and G. P. McCabe. Introduction to the Practice of Statistics. W.H. Freeman
& Company, 5th edition, 2006.
[40] K. Mr′ozek, G. Marcucci, P. Paschka, S. P. Whitman, and C. D. Bloomfield. Clinical
relevance of mutations and gene-expression changes in adult acute myeloid leukemia with
normal cytogenetics: are we ready for a prognostically prioritized molecular classification?
Blood, 109:431–448, 2007.
[41] C. L. Myers, M. J. Dunham, S. Y. Kung, and O. G. Troyanskaya. Accurate detection of aneuploidies
in array CGH and gene expression microarray data. Bioinformatics, 20(18):3533–
3543, 2004.
[42] A. B. Olshen and E. S. Venkatraman. Circular binary segmentation for the analysis of
array-based DNA copy number data. Biostatistics, 5(4):557–572, 2004.
[43] A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw Hill, 3rd
edition, 1991.
[44] F. Picard, S. Robin, M. Lavielle, C. Vaisse, and J.-J. Daudin. A statistical approach for
array CGH data analysis. BMC Bioinformatics, 6:27, 2005.
[45] D. Pinkel, R. Segraves, D. Sudar, S. Clark, I. Poole, D. Kowbel, C. Collins, W.-L. Kuo,
C. Chen, Y. Zha, et al. High resolution analysis of DNA copy number variation using
comparative genomic hybridization to microarrays. Nature Genetics, 20:207–211, 1998.
[46] J. R. Pollack, C. M. Perou, A. A. Alizadeh, M. B. Eisen, A. Pergamenschikov, C. F.
Williams, S. S. Jeffrey, D. Botstein, and P. O. Brown. Genome-wide analysis of DNA
copy-number changes using cDNA microarrays. Nature Genetics, 23(1):41–46, 1999.
[47] J. R. Pollack, T. Sorlie, C. M. Perou, C. A. Rees, S. S. Jeffrey, P. E. Lonning, R. Tibshirani,
D. Botstein, A.-L. Borresen-Dale, and P. O. Brown. Microarray analysis reveals a major
direct role of DNA copy number alteration in the transcriptional program of human breast
tumors. Proceedings of the National Academy of Sciences, 99(20):12963–12968, 2002.
[48] J. Quackenbush. Microarray data normalization and transformation. Nature Genetics,
32:496–501, 2002.
[49] F. G. R‥ucker, L. Bullinger, C. Schwaenen, D. B. Lipka, S. Wessendorf, S. Fr‥ohling,
M. Bentz, S. Miller, C. Scholl, R. F. Schlenk, et al. Disclosure of candidate genes in
acute myeloid leukemia with complex karyotypes using microarray-based molecular characterization.
Journal of Clinical Oncology, 24:3887–3894, 2006.
[50] R. Redon, S. Ishikawa, K. R. Fitch, L. Feuk, G. H. Perry, T. D. Andrews, H. Fiegler, M. H.
Shapero, A. R. Carson, W. Chen, et al. Global variation in copy number in the human
genome. Nature, 444:444–454, 2006.
[51] A. Renneville, C. Roumier, V. Biggio, O. Nibourel, N. Boissel, P. Fenaux, and C. Preudhomme.
Cooperating gene mutations in acute myeloid leukemia: a review of the literature.
Leukemia, 22:915–931, 2008.
[52] C. Rouveirol, N. Stransky, P. Hup′e, P. L. Rosa, E. Viara, E. Barillot, and F. Radvanyi.
Computation of recurrent minimal genomic alterations from array-CGH data. Bioinformatics,
22(7):849–856, 2006.
[53] J. Sebat, B. Lakshmi, J. Troge, J. Alexander, J. Young, P. Lundin, S. M°an′er, H. Massa,
M. Walker, M. Chi, et al. Large-scale copy number polymorphism in the human genome.
Science, 305:525–528, 2004.
[54] S. P. Shah, W. L. Lam, R. T. Ng, and K. P. Murphy. Modeling recurrent DNA copy
number alterations in array CGH data. Bioinformatics, 23:i450–i458, 2007.
[55] S. S. Shapiro and M. B.Wilk. An analysis of variance test for normality (complete samples).
Biometrika, 52:591–611, 1965.
[56] A. J. Sharp, D. P. Locke, S. D. McGrath, Z. Cheng, J. A. Bailey, R. U. Vallente, L. M.
Pertz, R. A. Clark, S. Schwartz, R. Segraves, V. V. Oseroff, D. G. Albertson, D. Pinkel, and
E. E. Eichler. Segmental duplications and copy-number variation in the human genome.
The American Journal of Human Genetics, 77(1):78–88, 2005.
[57] S. Solinas-Toldo, S. Lampel, S. Stilgenbauer, J. Nickolenko, A. Benner, H. D‥oner, T. Cremer,
and P. Lichter. Matrix-based comparative genomic hybridization: biochips to screen
for genomic imbalances. Genes, Chromosomes and Cancer, 20(4):399–407, 1997.
[58] E. Tuzun, A. J. Sharp, J. A. Bailey, R. Kaul, V. A. Morrison, L. M. Pertz, E. Haugen,
H. Hayden, D. Albertson, D. Pinkel, et al. Fine-scale structural variation of the human
genome. Nature Genetics, 37:727–732, 2005.
[59] J. W. Vardiman, N. L. Harris, and R. D. Brunning. The World Health Organization
(WHO) classification of the myeloid neoplasms. Blood, 100:2292–2302, 2002.
[60] E. S. Venkatraman and A. B. Olshen. A faster circular binary segmentation algorithm for
the analysis of array CGH data. Bioinformatics, 23(6):657–663, 2007.
[61] A. R. M. von Bergh, P. M. Wijers, A. J. Groot, S. van Zelderen-Bhola, J. H. F. Falkenburg,
P. M. Kluin, and E. Schuuring. Identification of a novel RAS GTPase-activating protein
(RASGAP) gene at 9q34 as an MLL fusion partner in a patient with de novo acute myeloid
leukemia. Genes, Chromosomes and Cancer, 39:324–334, 2004.
[62] H. Willenbrock and J. Fridlyand. A comparison study: applying segmentation to array
CGH data for downstream analyses. Bioinformatics, 21(22):4084–4091, 2005.
[63] A. F. Wright. Nature Encyclopedia of the Human Genome 2, pages 959–968. Nature
Publishing Group, London, 2003.
[64] Y. Yamashita, K. Minoura, T. Taya, S. i Fujiwara, K. Kurashina, H. Watanabe, Y. L.
Choi, M. Soda, H. Hatanaka, M. Enomoto, et al. Analysis of chromosome copy number in
leukemic cells by different microarray platforms. Leukemia, 21:1333–1337, 2007.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top