跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.255) 您好!臺灣時間:2026/07/03 14:36
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張益峰
研究生(外文):Yi-Feng Chang
論文名稱:萃取一致性序列特徵以預測人類啟動子
論文名稱(外文):Human Promoter Prediction with Extracted Consensus Sequence Patterns
指導教授:陳靖國陳靖國引用關係
指導教授(外文):Jeang-Kuo Chen
學位類別:碩士
校院名稱:朝陽科技大學
系所名稱:資訊管理系碩士班
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:英文
論文頁數:110
中文關鍵詞:生物資訊啟動子預測類神經網路加權法基因演算法一致性序列特徵
外文關鍵詞:BioinformaticsConsensus Sequence PatternPromoter PredictionWeighted-Sum ApproachNeural NetworkGenetic Algorithms
相關次數:
  • 被引用被引用:1
  • 點閱點閱:384
  • 評分評分:
  • 下載下載:10
  • 收藏至我的研究室書目清單書目收藏:1
由過去的生物實驗中了解到,啟動子通常位於基因轉錄啟始點之前,因此若能了解人類啟動子序列的共通特性,便能更進一步了解存在於人類基因體上的三至五萬個基因。雖然生物學家已實驗驗證發現很多啟動子序列,但是實驗過程相當費時費力,對於大量且長達數萬個鹼基對的序列,無法完整透過實驗發現。因此便有不少學者試圖利用生物資訊學的高速計算效能來預測啟動子序列,但是目前的啟動子預測工具在面對極為複雜的基因體序列時,仍無法做出準確的預測,再加上誤判率偏高,因而使得啟動子預測仍無法有效成為研究人員在尋找基因時的參考依據。所以本論文首先自NCBI的GenBank中下載並擷取啟動子及非啟動子序列,再將這些序列建構成啟動子序列資料庫。接著透過以基因演算法為基礎的一致性序列特徵擷取程式,從啟動子序列及非啟動子序列中分別擷取一致性序列特徵。然後利用加權法配合已擷取的一致性序列特徵進行啟動子的預測。但由於加權法無法很精確地給予適當權重,而且其時間複雜度較高,使得預測需要花上較多的時間,因此本論文又提出了一個兩階段學習的類神經網路改良預測的準確度。由實驗結果發現,本研究所提出方法相對於目前文獻上的啟動子預測工具有較佳的預測正確率與較低的誤判率,基主要原因是本方法利用基因演算法,可找出大量且均勻分佈的啟動子特有與非啟動子特有的一致性序列特徵,而較多的序列特徵則相對提高了預測的準確度。未來研究的方向則包括了啟動子範圍的預測以及不同物種間啟動子一致性序列特的比較。
Promoter region is a DNA sequence that is usually located on the upstream of the transcriptional starting site (TSS) of a gene. If the hallmarks of the known promoter sequences can be extracted, we can use these hallmarks to recognize unknown promoter regions from un-labeled genome sequences. Furthermore, we can indirectly identify the potential TSSs of genes, and then the estimated thirty to fifty thousand of genes can be predicted and explored.
Up to present, there are many announced promoter sequences that were discovered by molecular biologists with biological experiments. However, the process of discovering promoter sequences with traditional lab experiments is very time-consuming and costly. Therefore, many researchers take the advantage of high throughput analysis of bioinformatics for predicting promoter sequences. But, while those promoter prediction tools deal with the unknown and complex DNA sequences, they result in either low true positive or high false positive. For the above reasons, present promoter prediction tools still cannot be the consulting basis for genes identification.
Hence, in this thesis we first downloaded human promoter and non-promoter sequences from NCBI GenBank. After careful filtering, these promoter sequences were saved into our own designed promoter database called Promoter Databank. Next, a genetic algorithm based consensus sequence extracting program derived the promoter-specific and non-promoter-specific consensus sequence patterns from the promoter and non-promoter sequences. By applying a weighted-sum approach with the extracted consensus sequence patterns as detecting signals, we can predict if any unknown DNA sequence contains promoter sequences or not.
However, for the weight-sum approach, there is no way that a correct weight can be found and assigned for every set of consensus sequence patterns. Besides, the weighted-sum approach is based on string matching, and thus it makes the weighted-sum approach become time consuming and unsuitable for on-line prediction. For these reasons, this thesis proposed another two-phase neural network promoter prediction tool to improve the prediction accuracy and time complexity.
From the experiment results we found that, compared to the other existing promoter prediction tools, our proposed promoter prediction tools have better true positive and lower false positive rates. We believe this is because genetic algorithms can extract a large amount of uniformly distributed promoter-specific and non-promoter-specific consensus sequence patterns; and more sequence patterns lead to better prediction accuracy. Furthermore, because of the inclusion of non-promoter sequences, our proposed tools can also reduce the false positive rates.
In the future, the research idea and methods proposed in this thesis can be further applied to compare different organisms’ promoter sequences by adding other types of DNA sequences, such as repetitive sequences or intron sequences.
摘要....................................................................................................................I
Abstract............................................................................................................ III
Acknowledgement ............................................................................................ V
Table of Contents ............................................................................................VII
List of Tables..................................................................................................... X
List of Figures ..................................................................................................XI
Chapter 1 Introduction.................................................................................... 1
1.1. Background and motivation............................................................... 1
1.2. Problems of Promoter Prediction....................................................... 1
1.3. Purpose.............................................................................................. 2
1.4. Research scope .................................................................................. 3
1.5. Thesis organization............................................................................ 3
Chapter 2 Literature Review........................................................................... 4
2.1. Basic introduction to promoter .......................................................... 4
2.2. Related works of promoter prediction................................................ 5
2.2.1. Content/statistics-based approaches......................................... 5
2.2.2. Neural Network approaches................................................... 10
2.2.3. Hybrid approaches................................................................. 13
2.3. Summary of literature review .......................................................... 18
Chapter 3 Mining Consensus Sequence Patterns by Genetic Algorithms ...... 24
3.1. Introduction to the concept of mining consensus sequence patterns . 24
3.2. Consensus sequence pattern............................................................. 25
3.3. Introduction to Genetic Algorithms ................................................. 26
3.4. Why using Genetic Algorithms ........................................................ 27
3.5. Procedure of mining consensus sequence patterns from training data28
3.6. Fitness function ............................................................................... 31
Chapter 4 Promoter Prediction by Extracted Consensus Sequence Patterns.. 36
4.1. Weighted-sum approach .................................................................. 36
4.2. Two-Phase Neural Network (TPNN) ............................................... 39
4.2.1. Constraint of Artificial Neural Network................................. 40
4.2.2. Introduction to TPNN ............................................................ 42
4.2.3. Learning procedures of TPNN............................................... 44
4.2.3.1. Learning procedure of Phase 1............................................. 44
4.2.3.2. Learning procedure of Phase 2............................................. 48
4.2.4. Recall procedure of TPNN .................................................... 50
4.2.5. Back-Propagation algorithms................................................. 52
Chapter 5 Experiments and Results .............................................................. 55
5.1. Sequence data set............................................................................. 55
5.2. Experiment Results.......................................................................... 56
5.2.1. Consensus sequence patterns extracted using Genetic
Algorithms........................................................................................ 56
5.2.2. Results by weighted-sum approach........................................ 62
5.2.3. Results from Two-Phase Neural Network .............................. 67
5.2.4. Comparison with other promoter prediction tools .................. 69
5.2.5. Comparion of consensus sequence patterns with transcription
factor binding sites............................................................................ 70
5.2.6. Discussion of the experiment results...................................... 80
Chapter 6 Conclusions and Future Directions............................................... 82
6.1. Summary and Conclusions .............................................................. 82
6.2. Future Directions ............................................................................. 83
References........................................................................................................ 86
Appendix.......................................................................................................... 90
Accession numbers and length of the training promoter sequences .......... 90
Accession numbers and length of the training mRNA sequences.............. 93
Accession numbers and length of the testing promoter sequences ............ 96
Accession numbers and length of the testing mRNA sequences ............... 97
[1]Audic, S. and Claverie, J.M., “Detection of Eukaryotic Promoters using Markov Transition Matrices,” Computers and Chemistry, Vol. 21, pp. 223-227, 1997.
[2]Bajic, V.B., Chong, A., Seah, S.H., and Brusic, V., “An Intelligent System for Vertebrate Promoter Recognition,” IEEE Intelligent Systems, Vol. 17, pp. 64-70, 2002.
[3]Bajic, V.B., Seah, S.H., Chong, A., Krishnan, S.P.T., Koh, J.L.Y., Brusic, V., “Computer model for recognition of functional transcription start sites in polymerase II promoters of vertebrates,” Journal of Molecular Graphics & Modeling, Vol. 21, pp. 323-332, 2003.
[4]Bucher, P., “Weight Matrix Descriptions of Four Eukaryotic RNA Polymerase II Promoter Elements Derived From 502 Unrelated Promoter Sequences,” J. Mol. Biol., Vol. 212, pp. 563-578, 1990.
[5]Chen, Q.K., Hertz, G.Z., and Stormo G.D., “PromFD 1.0: a Computer Program that Predicts Eukaryotic Pol II Promoters using Strings and IMD Matrices,” CABIOS, Vol. 13, pp. 29-35, 1997.
[6]Demeler, B. and Zhou, G., “Neural Network Optimization for E. coli Promoter Prediction,” Nucleic Acids Research, Vol. 19, pp. 1593-1599, 1991.
[7]Fickett, J.W. and Hatzigeorgiou A.G., “Eukaryotic Promoter Recognition,” Genome Research, Vol. 7, pp. 861-878, 1997.
[8]Fu, L.M., and Shortliffe, E.H., “The Application of Certainty Factors to Neural Computing for Rule Discovery,” IEEE Transactions on Neural Networks, Vol. 11, pp. 647-657, 2000.
[9]Goldberg, D.E., “Genetic Algorithms in Search, Optimization and Machine Learning,” Addison-Wesley Publishing Inc., MA, 1991.
[10]Graur, D. and Li, W.H., “Fundamentals of Molecular Evolution,” Sinauer Associates Inc., Second Ed, MA, 1999.
[11]Haykin, S., “Neural Networks- A Comprehensive Foundation,” Prentice Hall, Second Edition, NJ, 1999.
[12]Hutchinson, G.B., “The Prediction of Vertebrate Promoter Regions using Differential Hexamer Frequency Analysis,” CABIOS, Vol. 12, pp. 391-398, 1996.
[13]Knudsen, S., “Promoter2.0: for the Recognition of Pol II Promoter Sequences,” Bioinformatics, Vol. 15, pp. 356-361, 1999.
[14]Kondrakhin, Y.V., Kel, A.E., Kolchanov, N.A., Romashchenko, A.G., and Milanesi, L., “Eukaryotic Promoter Recognition by Binding Sites for Transcription Factors,” CABIOS, Vol. 11, pp. 477-488, 1995.
[15]Levitsky, V.G., and Katokhin, A.V., “Recognition of Eukaryotic Promoters Using a Genetic Algorithm Based on Iterative Discriminant Analysis,” In Silico Biology, Vol. 3, 2003.
[16]Liu, R., and States, D.J., “Consensus Promoter Identification in the Human Genome Utilizing Expressed Gene Markers and Gene Modeling,” Genome Research, Vol. 12, pp. 462-469, 2002.
[17]Ma, Q., Wang, J.T.L., Shasha, W.D., and Wu, C.H., “DNA Sequence Classification via an Expectation Maximization Algorithm and Neural Networks: A Case Study,” IEEE Transactions on Systems, Man, and Cybernetics, Vol. 31, pp. 468-475, 2001.
[18]Mahadevan, I. and Ghosh, I., “Analysis of E. coli Promoter Structures Using Neural Networks,” Nucleic Acids Research, Vol. 22, pp. 2158-2165, 1994.
[19]Matis, S., Xu, Y., Shah, M., and Guan, X., et al., “Detection of RNA Polymerase II Promoters and Polyadenylation Sites in Human DNA Sequence,” Computer and Chemistry, Vol. 20, pp. 135-140, 1996.
[20]Ohler, U., Harbeck, S., and Niemann, H., et al., “Interpolated Markov Chains for Eukaryotic Promoter Recognition,” Bioinformatics, Vol. 15, pp. 362-369, 1999.
[21]Ohler, U., and Niemann H., “Identification and Analysis of Eukaryotic Promoters: Recent Computational Approaches,” Trends in Genetic, Vol. 17, pp. 56-60, 2001.
[22]Ohler, U., Niemann, H., and Liao, G.C., et al., “Joint Modeling of DNA Sequence and Physical Properties to Improve Eukaryotic Promoter Recognition,” Bioinformatics, Vol. 17, pp. S199-S206, 2001.
[23]Pedersen, A.G., Baldi, P., Chauvin, Y., and Brunak, S., “The Biology of Eukaryotic Promoter Prediction-a Review,” Computers and Chemistry, Vol. 23, pp. 191-207, 1999.
[24]Périer, R.C., Junier, T., Bonnard, C., and Bucher, P., “The Eukaryotic Promoter Database EPD,” Nucleic Acids Research, Vol. 26, pp. 353-357, 1998.
[25]Prestridge, D.S., “Predicting Pol II Promoter Sequences using Transcription Factor Binding Sites,” J. Mol. Biol., Vol. 249, pp. 923-932, 1995.
[26]Reese, M.G., “Application of a Time-Delay Neural Network to Promoter Annotation in the Drosophila melanogaster Genome,” Computers and Chemistry, Vol. 26, pp. 51-56, 2001.
[27]Scherf, M., Klingenhoff, A., and Werner, T., “Highly Specific Localization of Promoter Regions in Large Genomic Sequences by PromoterInspector: a Novel Context Analysis Approach,” J. Mol. Biol., Vol. 297, pp. 599-606, 2000.
[28]Solovyev, V. and Salamov, A., “The Gene-Finder Computer Tools for Analysis of Human and Model Organisms Genome Sequences,” ISMB, Vol. 5, pp. 294-302, 1997.
[29]Suzuki, Y., Tsunoda, T., and Sese, J. et al., “Identification and Characterization of the Potential Promoter Regions of 1031 Kinds of Human Genes,” Genome Research, Vol. 11, pp. 677-684, 2001.
[30]Werner, T., “Models for Prediction and Recognition of Eukaryotic Promoters,” Mammalian Genome, Vol. 10, pp. 168-175, 1999.
[31]Weller, K. and Recknagel, R.-D., “Promoter Strength Prediction Based on Occurrence Frequencies of Consensus Patterns,” J. Theor. Biol., Vol. 171, pp. 355-359, 1994.
[32]Zhang. M. Q., “Identification of Human Gene Core Promoters in Silico,” Genome Research, Vol. 8, pp. 319-326, 1998.
[33]NCBI-National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/.
[34]TRANSFAC-The Transcription Factor Database, http://transfac.gbf.de/TRANSFAC/.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊