跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.171) 您好!臺灣時間:2024/12/02 04:21
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:鄭若君
研究生(外文):Jo-Chun Cheng
論文名稱:基因序列演化後之相似性研究
論文名稱(外文):A Study of Similarity Measures of DNASequences under Evolution
指導教授:吳鐵肩吳鐵肩引用關係李隆安李隆安引用關係
指導教授(外文):Tiee-Jian WuLung-An Li
學位類別:碩士
校院名稱:國立成功大學
系所名稱:統計學系碩博士班
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:英文
論文頁數:92
中文關鍵詞:標準化的歐式距離外顯子演化模式突變率重組率
外文關鍵詞:mutation ratethe exonrecombination ratestandardized Euclidean distanceKullback-Leibler discrepancyevolution model
相關次數:
  • 被引用被引用:1
  • 點閱點閱:166
  • 評分評分:
  • 下載下載:17
  • 收藏至我的研究室書目清單書目收藏:0
演化是任何物種能生存至今必經的過程,生物演化包括了基因突變、基因重組、適應外在環境改變等原因。而本研究分為以下四個步驟:
(1) 於PIR (Protein Information Resource) 網站上蒐集了人類的蛋白質序列,及其對應的基因序列,並記錄基因序列上外顯子的資訊,探討所蒐集的資訊之機率分佈。
(2) 建立一個人類演化的統計模式,趨近於人類真實的演化過程,模式中考慮的演化因素包括突變率、基因重組率、死亡率、人口成長率等。
(3) 利用一對男女作為人類始袓,透過上述演化的統計模式,逐步模擬產生兩組5000個第一萬代的子代,再計算子代間基因序列的標準化的歐式距離以及Kullback-Leibler discrepancy。
(4) 探討演化的統計模式在不同參數時所產生子代之基因序列間的非相似性,利用變異數分析與邏輯斯迴歸了解各個演化因素之顯著性。
Evolution is the most important process of all kinds of organism and species experienced. The evolution consists of the mutation, recombination of genes, survival from environment changes, etc. This thesis is composed of four steps. First, we searched proteins in protein family or superfamily in PIR (Protein Information Resource) web site (http://pir.georgetown.edu). We recorded the nucleotide sequences, from which those proteins were transcribed, and the information of exons in these DNA sequences, and then found the empirical distributions on the information of these exons. Next, we constructed an evolution model to imitate the real evolution history of human being. Some evolution factors like the mutation rate, the recombination rate of genes, the death rate, and the growth of population size, etc. were included in this proposed evolution model. In the third step, a simulation study utilizing the new evolution model was conducted. We generated 5000 offsprings of the ten-thousandth generation from one ancestor parent generation twice for each of nine combinations of the levels of evolution factors, and we evaluated the dissimilarity measure between DNA sequences of each combination by the standardized Euclidean distance, and Kullback-Leibler discrepancy function. Last, the ANOVA analysis and the logistic regression analysis of some statistics of dissimilarity measures were employed to find the significant evolution factors, and the estimates of those levels of evolution factors were also obtained to understand the effect of these levels.
Chapter 1 Introduction…………………………………………………………1
1.1 Messenger RNA………………………………………………………………2
1.2 Recombination of Genes …………………………………………………3
1.3 Mutation in the DNA………………………………………………………5
1.4 Dissimilarity Measures of DNA Sequence ……………………………6
1.5 Outline ……………………………………………………………………10
Chapter 2 Distribution of the Number、Length and Position of Exons
……………………………………………………………11
2.1 Data Collection …………………………………………………………11
2.2 Distribution of the Number of Exons ………………………………13
2.3 Distribution of the Length of Exons ………………………………16
2.4 Distribution of the Position of Exons ……………………………18
Chapter 3 An Evolution Model ………………………………………………22
3.1 Population Size per Generation………………………………………22
3.2 Creation of the Filial Generation …………………………………23
3.3 Factors of the Evolution Model………………………………………23
3.3.1 Mutation of the Evolution Model…………………………………24
3.3.2 Recombination of Genes of the Evolution Model………………25
Chapter 4 A Simulation study ………………………………………………28
4.1 The First Ancestor Sequences………………………………………28
4.2 Simulation Process……………………………………………………32
4.2.1 Population Size per Generation ………………………………32
4.2.2 Parameters of the Evolution Model……………………………33
4.2.3 Dissimilarity Measures of DNA Sequence ……………………36
4.3 Simulation Result ……………………………………………………37
Chapter 5 Finding Significant Evolution Factors………………………39
5.1 The Analysis of Variance……………………………………………39
5.2 The Logistic Regression Model ……………………………………43
Chapter 6 Conclusion …………………………………………………………46
Reference ………………………………………………………………………47
Appendix …………………………………………………………………………49
Daniel L. Hartl Elizabeth W. Jones (1999). Essential Genetics, Jones and Bartlett.
Lindar. Maxson, Charles h. Daugherty (1992). Genetics-A Human Perspective, Wm. C. Brown.
Blaisdell, B. E. (1986). A measure of the similarity of sets of sequences not requiring sequence alignment. Proceedings of the National Academy of Science, U.S.A. 83, 5155-5159.
Blaisdell, B. E. (1989a). Effectiveness of measures requiring and not requiring prior sequence alignment of estimating the dissimilarity of natural sequences. Journal of Molecular Evolution 29, 526-537.
Blaisdell, B. E. (1989b). Average value of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch count requiring sequence alignment for a computer-generated model system. Journal of Molecular Evolution 29, 538-549.
Karlin, S. and Brendel, V. (1993). Patchiness and correlation in DNA sequences. Science 259, 677-679.
Churchill, A. (1992). Hidden Markov chains and the analysis of genome structure. Computers in Chemistry 16, 107-115.
Gentleman, J. F. and Mullin, R. C. (1989). The distribution of the frequency of occurrence of nucleotide subsequences, based on their overlap capability. Biometrics 45, 35-52.
Davison, D. (1984). Sequence similarity searching for molecular biologists. Bulletin of Mathematical Biology 46, 437-474.
Hide, W., Burke, J., and Davison, D. (1994). Biological evaluation of , an algorithm for high performance sequence comparison. Journal of Computa- tional Biology 1, 199-215.
Fichant, G. and Gautier, C. (1987). Statistical method for predicting protein coding regions in nucleic acid sequences. CABIOS 3, 287-295.
Karlin, S., Ost, F., and Blaisdell, B. E. (1989). Patterns in DNA and amino acid sequences and their statistical significance. In Mathematical Methods for DNA Sequences, M. S. Waterman (ed), 133-157. Boca Raton, Florida: CRC.
Waterman, M. S. (ed.). (1989). Mathematical Methods for DNA Sequences. Boca Raton, Florida: CRC.
Wu, T.-J., Burk, J. P., and Davison, D. B. (1997). A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words. Biometrics 53, 1431-1439.
Wu, T.-J., Hsieh, Y.-C., and Li, L.-A. (2001). Statistical Measures of DNA Sequences Dissimilarity under Markov Chain Models of Base Composition. Biometrics 57, 441-448.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top