研究生(外文):Jo-Chun Cheng
論文名稱(外文):A Study of Similarity Measures of DNASequences under Evolution
指導教授(外文):Tiee-Jian WuLung-An Li
外文關鍵詞:mutation ratethe exonrecombination ratestandardized Euclidean distanceKullback-Leibler discrepancyevolution model
(1) 於PIR (Protein Information Resource) 網站上蒐集了人類的蛋白質序列,及其對應的基因序列,並記錄基因序列上外顯子的資訊,探討所蒐集的資訊之機率分佈。
(2) 建立一個人類演化的統計模式,趨近於人類真實的演化過程,模式中考慮的演化因素包括突變率、基因重組率、死亡率、人口成長率等。
(3) 利用一對男女作為人類始袓,透過上述演化的統計模式,逐步模擬產生兩組5000個第一萬代的子代,再計算子代間基因序列的標準化的歐式距離以及Kullback-Leibler discrepancy。
(4) 探討演化的統計模式在不同參數時所產生子代之基因序列間的非相似性,利用變異數分析與邏輯斯迴歸了解各個演化因素之顯著性。
Evolution is the most important process of all kinds of organism and species experienced. The evolution consists of the mutation, recombination of genes, survival from environment changes, etc. This thesis is composed of four steps. First, we searched proteins in protein family or superfamily in PIR (Protein Information Resource) web site (http://pir.georgetown.edu). We recorded the nucleotide sequences, from which those proteins were transcribed, and the information of exons in these DNA sequences, and then found the empirical distributions on the information of these exons. Next, we constructed an evolution model to imitate the real evolution history of human being. Some evolution factors like the mutation rate, the recombination rate of genes, the death rate, and the growth of population size, etc. were included in this proposed evolution model. In the third step, a simulation study utilizing the new evolution model was conducted. We generated 5000 offsprings of the ten-thousandth generation from one ancestor parent generation twice for each of nine combinations of the levels of evolution factors, and we evaluated the dissimilarity measure between DNA sequences of each combination by the standardized Euclidean distance, and Kullback-Leibler discrepancy function. Last, the ANOVA analysis and the logistic regression analysis of some statistics of dissimilarity measures were employed to find the significant evolution factors, and the estimates of those levels of evolution factors were also obtained to understand the effect of these levels.
Chapter 1 Introduction…………………………………………………………1
1.1 Messenger RNA………………………………………………………………2
1.2 Recombination of Genes …………………………………………………3
1.3 Mutation in the DNA………………………………………………………5
1.4 Dissimilarity Measures of DNA Sequence ……………………………6
1.5 Outline ……………………………………………………………………10
Chapter 2 Distribution of the Number、Length and Position of Exons
2.1 Data Collection …………………………………………………………11
2.2 Distribution of the Number of Exons ………………………………13
2.3 Distribution of the Length of Exons ………………………………16
2.4 Distribution of the Position of Exons ……………………………18
Chapter 3 An Evolution Model ………………………………………………22
3.1 Population Size per Generation………………………………………22
3.2 Creation of the Filial Generation …………………………………23
3.3 Factors of the Evolution Model………………………………………23
3.3.1 Mutation of the Evolution Model…………………………………24
3.3.2 Recombination of Genes of the Evolution Model………………25
Chapter 4 A Simulation study ………………………………………………28
4.1 The First Ancestor Sequences………………………………………28
4.2 Simulation Process……………………………………………………32
4.2.1 Population Size per Generation ………………………………32
4.2.2 Parameters of the Evolution Model……………………………33
4.2.3 Dissimilarity Measures of DNA Sequence ……………………36
4.3 Simulation Result ……………………………………………………37
Chapter 5 Finding Significant Evolution Factors………………………39
5.1 The Analysis of Variance……………………………………………39
5.2 The Logistic Regression Model ……………………………………43
Chapter 6 Conclusion …………………………………………………………46
Reference ………………………………………………………………………47
Appendix …………………………………………………………………………49
