臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.255) 您好！臺灣時間：2026/07/03 14:36

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

張益峰

研究生(外文):

Yi-Feng Chang

論文名稱:

萃取一致性序列特徵以預測人類啟動子

論文名稱(外文):

Human Promoter Prediction with Extracted Consensus Sequence Patterns

指導教授:

陳靖國

指導教授(外文):

Jeang-Kuo Chen

學位類別:

碩士

校院名稱:

朝陽科技大學

系所名稱:

資訊管理系碩士班

學門:

電算機學門

學類:

電算機一般學類

論文種類:

學術論文

論文出版年:

2003

畢業學年度:

語文別:

英文

論文頁數:

110

中文關鍵詞:

生物資訊、啟動子預測、類神經網路、加權法、基因演算法、一致性序列特徵

外文關鍵詞:

Bioinformatics、Consensus Sequence Pattern、Promoter Prediction、Weighted-Sum Approach、Neural Network、Genetic Algorithms

相關次數:

被引用:1
點閱:384
評分:
下載:10
書目收藏:1

由過去的生物實驗中了解到，啟動子通常位於基因轉錄啟始點之前，因此若能了解人類啟動子序列的共通特性，便能更進一步了解存在於人類基因體上的三至五萬個基因。雖然生物學家已實驗驗證發現很多啟動子序列，但是實驗過程相當費時費力，對於大量且長達數萬個鹼基對的序列，無法完整透過實驗發現。因此便有不少學者試圖利用生物資訊學的高速計算效能來預測啟動子序列，但是目前的啟動子預測工具在面對極為複雜的基因體序列時，仍無法做出準確的預測，再加上誤判率偏高，因而使得啟動子預測仍無法有效成為研究人員在尋找基因時的參考依據。所以本論文首先自NCBI的GenBank中下載並擷取啟動子及非啟動子序列，再將這些序列建構成啟動子序列資料庫。接著透過以基因演算法為基礎的一致性序列特徵擷取程式，從啟動子序列及非啟動子序列中分別擷取一致性序列特徵。然後利用加權法配合已擷取的一致性序列特徵進行啟動子的預測。但由於加權法無法很精確地給予適當權重，而且其時間複雜度較高，使得預測需要花上較多的時間，因此本論文又提出了一個兩階段學習的類神經網路改良預測的準確度。由實驗結果發現，本研究所提出方法相對於目前文獻上的啟動子預測工具有較佳的預測正確率與較低的誤判率，基主要原因是本方法利用基因演算法，可找出大量且均勻分佈的啟動子特有與非啟動子特有的一致性序列特徵，而較多的序列特徵則相對提高了預測的準確度。未來研究的方向則包括了啟動子範圍的預測以及不同物種間啟動子一致性序列特的比較。

Promoter region is a DNA sequence that is usually located on the upstream of the transcriptional starting site (TSS) of a gene. If the hallmarks of the known promoter sequences can be extracted, we can use these hallmarks to recognize unknown promoter regions from un-labeled genome sequences. Furthermore, we can indirectly identify the potential TSSs of genes, and then the estimated thirty to fifty thousand of genes can be predicted and explored.
Up to present, there are many announced promoter sequences that were discovered by molecular biologists with biological experiments. However, the process of discovering promoter sequences with traditional lab experiments is very time-consuming and costly. Therefore, many researchers take the advantage of high throughput analysis of bioinformatics for predicting promoter sequences. But, while those promoter prediction tools deal with the unknown and complex DNA sequences, they result in either low true positive or high false positive. For the above reasons, present promoter prediction tools still cannot be the consulting basis for genes identification.
Hence, in this thesis we first downloaded human promoter and non-promoter sequences from NCBI GenBank. After careful filtering, these promoter sequences were saved into our own designed promoter database called Promoter Databank. Next, a genetic algorithm based consensus sequence extracting program derived the promoter-specific and non-promoter-specific consensus sequence patterns from the promoter and non-promoter sequences. By applying a weighted-sum approach with the extracted consensus sequence patterns as detecting signals, we can predict if any unknown DNA sequence contains promoter sequences or not.
However, for the weight-sum approach, there is no way that a correct weight can be found and assigned for every set of consensus sequence patterns. Besides, the weighted-sum approach is based on string matching, and thus it makes the weighted-sum approach become time consuming and unsuitable for on-line prediction. For these reasons, this thesis proposed another two-phase neural network promoter prediction tool to improve the prediction accuracy and time complexity.
From the experiment results we found that, compared to the other existing promoter prediction tools, our proposed promoter prediction tools have better true positive and lower false positive rates. We believe this is because genetic algorithms can extract a large amount of uniformly distributed promoter-specific and non-promoter-specific consensus sequence patterns; and more sequence patterns lead to better prediction accuracy. Furthermore, because of the inclusion of non-promoter sequences, our proposed tools can also reduce the false positive rates.
In the future, the research idea and methods proposed in this thesis can be further applied to compare different organisms’ promoter sequences by adding other types of DNA sequences, such as repetitive sequences or intron sequences.

摘要....................................................................................................................I
Abstract............................................................................................................ III
Acknowledgement ............................................................................................ V
Table of Contents ............................................................................................VII
List of Tables..................................................................................................... X
List of Figures ..................................................................................................XI
Chapter 1 Introduction.................................................................................... 1
1.1. Background and motivation............................................................... 1
1.2. Problems of Promoter Prediction....................................................... 1
1.3. Purpose.............................................................................................. 2
1.4. Research scope .................................................................................. 3
1.5. Thesis organization............................................................................ 3
Chapter 2 Literature Review........................................................................... 4
2.1. Basic introduction to promoter .......................................................... 4
2.2. Related works of promoter prediction................................................ 5
2.2.1. Content/statistics-based approaches......................................... 5
2.2.2. Neural Network approaches................................................... 10
2.2.3. Hybrid approaches................................................................. 13
2.3. Summary of literature review .......................................................... 18
Chapter 3 Mining Consensus Sequence Patterns by Genetic Algorithms ...... 24
3.1. Introduction to the concept of mining consensus sequence patterns . 24
3.2. Consensus sequence pattern............................................................. 25
3.3. Introduction to Genetic Algorithms ................................................. 26
3.4. Why using Genetic Algorithms ........................................................ 27
3.5. Procedure of mining consensus sequence patterns from training data28
3.6. Fitness function ............................................................................... 31
Chapter 4 Promoter Prediction by Extracted Consensus Sequence Patterns.. 36
4.1. Weighted-sum approach .................................................................. 36
4.2. Two-Phase Neural Network (TPNN) ............................................... 39
4.2.1. Constraint of Artificial Neural Network................................. 40
4.2.2. Introduction to TPNN ............................................................ 42
4.2.3. Learning procedures of TPNN............................................... 44
4.2.3.1. Learning procedure of Phase 1............................................. 44
4.2.3.2. Learning procedure of Phase 2............................................. 48
4.2.4. Recall procedure of TPNN .................................................... 50
4.2.5. Back-Propagation algorithms................................................. 52
Chapter 5 Experiments and Results .............................................................. 55
5.1. Sequence data set............................................................................. 55
5.2. Experiment Results.......................................................................... 56
5.2.1. Consensus sequence patterns extracted using Genetic
Algorithms........................................................................................ 56
5.2.2. Results by weighted-sum approach........................................ 62
5.2.3. Results from Two-Phase Neural Network .............................. 67
5.2.4. Comparison with other promoter prediction tools .................. 69
5.2.5. Comparion of consensus sequence patterns with transcription
factor binding sites............................................................................ 70
5.2.6. Discussion of the experiment results...................................... 80
Chapter 6 Conclusions and Future Directions............................................... 82
6.1. Summary and Conclusions .............................................................. 82
6.2. Future Directions ............................................................................. 83
References........................................................................................................ 86
Appendix.......................................................................................................... 90
Accession numbers and length of the training promoter sequences .......... 90
Accession numbers and length of the training mRNA sequences.............. 93
Accession numbers and length of the testing promoter sequences ............ 96
Accession numbers and length of the testing mRNA sequences ............... 97

[1]Audic, S. and Claverie, J.M., “Detection of Eukaryotic Promoters using Markov Transition Matrices,” Computers and Chemistry, Vol. 21, pp. 223-227, 1997.
[2]Bajic, V.B., Chong, A., Seah, S.H., and Brusic, V., “An Intelligent System for Vertebrate Promoter Recognition,” IEEE Intelligent Systems, Vol. 17, pp. 64-70, 2002.
[3]Bajic, V.B., Seah, S.H., Chong, A., Krishnan, S.P.T., Koh, J.L.Y., Brusic, V., “Computer model for recognition of functional transcription start sites in polymerase II promoters of vertebrates,” Journal of Molecular Graphics & Modeling, Vol. 21, pp. 323-332, 2003.
[4]Bucher, P., “Weight Matrix Descriptions of Four Eukaryotic RNA Polymerase II Promoter Elements Derived From 502 Unrelated Promoter Sequences,” J. Mol. Biol., Vol. 212, pp. 563-578, 1990.
[5]Chen, Q.K., Hertz, G.Z., and Stormo G.D., “PromFD 1.0: a Computer Program that Predicts Eukaryotic Pol II Promoters using Strings and IMD Matrices,” CABIOS, Vol. 13, pp. 29-35, 1997.
[6]Demeler, B. and Zhou, G., “Neural Network Optimization for E. coli Promoter Prediction,” Nucleic Acids Research, Vol. 19, pp. 1593-1599, 1991.
[7]Fickett, J.W. and Hatzigeorgiou A.G., “Eukaryotic Promoter Recognition,” Genome Research, Vol. 7, pp. 861-878, 1997.
[8]Fu, L.M., and Shortliffe, E.H., “The Application of Certainty Factors to Neural Computing for Rule Discovery,” IEEE Transactions on Neural Networks, Vol. 11, pp. 647-657, 2000.
[9]Goldberg, D.E., “Genetic Algorithms in Search, Optimization and Machine Learning,” Addison-Wesley Publishing Inc., MA, 1991.
[10]Graur, D. and Li, W.H., “Fundamentals of Molecular Evolution,” Sinauer Associates Inc., Second Ed, MA, 1999.
[11]Haykin, S., “Neural Networks- A Comprehensive Foundation,” Prentice Hall, Second Edition, NJ, 1999.
[12]Hutchinson, G.B., “The Prediction of Vertebrate Promoter Regions using Differential Hexamer Frequency Analysis,” CABIOS, Vol. 12, pp. 391-398, 1996.
[13]Knudsen, S., “Promoter2.0: for the Recognition of Pol II Promoter Sequences,” Bioinformatics, Vol. 15, pp. 356-361, 1999.
[14]Kondrakhin, Y.V., Kel, A.E., Kolchanov, N.A., Romashchenko, A.G., and Milanesi, L., “Eukaryotic Promoter Recognition by Binding Sites for Transcription Factors,” CABIOS, Vol. 11, pp. 477-488, 1995.
[15]Levitsky, V.G., and Katokhin, A.V., “Recognition of Eukaryotic Promoters Using a Genetic Algorithm Based on Iterative Discriminant Analysis,” In Silico Biology, Vol. 3, 2003.
[16]Liu, R., and States, D.J., “Consensus Promoter Identification in the Human Genome Utilizing Expressed Gene Markers and Gene Modeling,” Genome Research, Vol. 12, pp. 462-469, 2002.
[17]Ma, Q., Wang, J.T.L., Shasha, W.D., and Wu, C.H., “DNA Sequence Classification via an Expectation Maximization Algorithm and Neural Networks: A Case Study,” IEEE Transactions on Systems, Man, and Cybernetics, Vol. 31, pp. 468-475, 2001.
[18]Mahadevan, I. and Ghosh, I., “Analysis of E. coli Promoter Structures Using Neural Networks,” Nucleic Acids Research, Vol. 22, pp. 2158-2165, 1994.
[19]Matis, S., Xu, Y., Shah, M., and Guan, X., et al., “Detection of RNA Polymerase II Promoters and Polyadenylation Sites in Human DNA Sequence,” Computer and Chemistry, Vol. 20, pp. 135-140, 1996.
[20]Ohler, U., Harbeck, S., and Niemann, H., et al., “Interpolated Markov Chains for Eukaryotic Promoter Recognition,” Bioinformatics, Vol. 15, pp. 362-369, 1999.
[21]Ohler, U., and Niemann H., “Identification and Analysis of Eukaryotic Promoters: Recent Computational Approaches,” Trends in Genetic, Vol. 17, pp. 56-60, 2001.
[22]Ohler, U., Niemann, H., and Liao, G.C., et al., “Joint Modeling of DNA Sequence and Physical Properties to Improve Eukaryotic Promoter Recognition,” Bioinformatics, Vol. 17, pp. S199-S206, 2001.
[23]Pedersen, A.G., Baldi, P., Chauvin, Y., and Brunak, S., “The Biology of Eukaryotic Promoter Prediction-a Review,” Computers and Chemistry, Vol. 23, pp. 191-207, 1999.
[24]Périer, R.C., Junier, T., Bonnard, C., and Bucher, P., “The Eukaryotic Promoter Database EPD,” Nucleic Acids Research, Vol. 26, pp. 353-357, 1998.
[25]Prestridge, D.S., “Predicting Pol II Promoter Sequences using Transcription Factor Binding Sites,” J. Mol. Biol., Vol. 249, pp. 923-932, 1995.
[26]Reese, M.G., “Application of a Time-Delay Neural Network to Promoter Annotation in the Drosophila melanogaster Genome,” Computers and Chemistry, Vol. 26, pp. 51-56, 2001.
[27]Scherf, M., Klingenhoff, A., and Werner, T., “Highly Specific Localization of Promoter Regions in Large Genomic Sequences by PromoterInspector: a Novel Context Analysis Approach,” J. Mol. Biol., Vol. 297, pp. 599-606, 2000.
[28]Solovyev, V. and Salamov, A., “The Gene-Finder Computer Tools for Analysis of Human and Model Organisms Genome Sequences,” ISMB, Vol. 5, pp. 294-302, 1997.
[29]Suzuki, Y., Tsunoda, T., and Sese, J. et al., “Identification and Characterization of the Potential Promoter Regions of 1031 Kinds of Human Genes,” Genome Research, Vol. 11, pp. 677-684, 2001.
[30]Werner, T., “Models for Prediction and Recognition of Eukaryotic Promoters,” Mammalian Genome, Vol. 10, pp. 168-175, 1999.
[31]Weller, K. and Recknagel, R.-D., “Promoter Strength Prediction Based on Occurrence Frequencies of Consensus Patterns,” J. Theor. Biol., Vol. 171, pp. 355-359, 1994.
[32]Zhang. M. Q., “Identification of Human Gene Core Promoters in Silico,” Genome Research, Vol. 8, pp. 319-326, 1998.
[33]NCBI-National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/.
[34]TRANSFAC-The Transcription Factor Database, http://transfac.gbf.de/TRANSFAC/.

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	整合基因演算法及類神經網路於現貨開盤指數之預測-以新加坡交易所摩根台股指數期貨為例
2.	時間序列與人工智慧方法在台股指數報酬率預測之績效比較
3.	應用股票趨勢技術分析於動態投資組合保險中之操作策略
4.	射出成型機最佳參數之預測
5.	運用類神經網路在外匯選擇權評價模式之實證研究
6.	營造廠專業協力廠商評鑑模式之建立與應用
7.	類神經網路與基因遺傳演算法於WEDM加工參數最佳化之應用
8.	營收資訊揭露對股價報酬率預測效果之影響─應用修正式基因類神經網路模型
9.	演化式建築工程成本概算模式之言究
10.	使用柔性演算法於穩健最佳化設計
11.	運用基因演算法改進三階段投資組合建構程序
12.	整合基因演算法及類神經網路於印刷電路板生產預測之研究
13.	應用基因類神經網路於空氣品質短期預測及監測資料異常值診斷之研究－以台中縣沙鹿空品測站為例
14.	土石流發生臨界曲線之研究-模糊集合及類神經網路
15.	演化式案例推理在營建工程履約爭議處理之研究

無相關期刊

1.	電子化顧客關係管理對企業經營策略之影響
2.	原生型XML資料庫關聯規則之探勘與利用關聯規則探勘壓縮資料庫
3.	一個有效的文件檢索索引結構-關鍵詞繼承結構
4.	企業知識發掘架構透過資料庫關聯規則探勘
5.	有效的空間關聯規則探勘方法之設計
6.	不明網路環境下協議問題之探索及其應用之研究
7.	個人化線上促銷決策支援系統
8.	運用類神經網路與資料探勘技術於網路教學課程推薦之研究
9.	顧問業對兩岸台商企業ERP導入之成效預測與KM運用之研究
10.	植基於單一顏色物件特徵與顏色複雜度特徵之區域影像查詢系統
11.	輔助供應鏈整合方法之研究
12.	行銷計畫案例式推理系統之設計與建置
13.	以服務為導向之客戶關係管理資訊系統物件導向式分析與設計方法
14.	透過知識擷取與管理達成顧客關係管理之行銷策略應用
15.	在映拓邏輯架構下探究內部網路上分散式資料庫之最佳資料配置

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室