跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.136) 您好!臺灣時間:2025/09/21 07:03
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:彭俊澄
研究生(外文):Chun-Cheng Peng
論文名稱:利用最大期望演算法、模糊集合與類神經網路實現大腸桿菌啟動子之預測
論文名稱(外文):E. coli Promoter Prediction Using Expectation Maximization Algorithms, Fuzzy Sets and Neural Networks
指導教授:林正堅林正堅引用關係
指導教授(外文):Cheng-Jian Lin
學位類別:碩士
校院名稱:朝陽科技大學
系所名稱:資訊工程系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:英文
論文頁數:71
中文關鍵詞:生物資訊大腸桿菌啟動子預測編碼方法最大期望演算法模糊集合類神經網路
外文關鍵詞:BioinformaticsE. colipromoter predictionencoding methodsexpectation maximization algorithmfuzzy setsneural networks
相關次數:
  • 被引用被引用:0
  • 點閱點閱:490
  • 評分評分:
  • 下載下載:35
  • 收藏至我的研究室書目清單書目收藏:0
全長4,639,221個鹼基對的大腸桿菌(K12)基因組的定序工作已在1997年完成。共有4288個基因負責蛋白質編碼的工作,而其中的38%目前仍不清楚其實際功能為何。對於原核生物的啟動子預測而言,一項主要的問題是啟動子本身的長度以及其與轉錄起點的距離仍然未能窺知。藉由最大期望演算法的提出,我們已能將啟動子在去氧核醣核酸序列上加以精確定位,更進一步還能歸納出最具代表性的序列特徵,提供給類神經網路進行學習。然而,在訓練樣本的選擇上,多數相關研究均傾向於擷取較大範圍的序列以供系統學習;如此作法對於系統的運算能力以及記憶體的空間需求而言,乃是一項沈重的負荷。在本論文中,經由最大期望演算法的特徵萃取之後,若仍採用傳統的正交編碼方式,依然無法避免系統的運算負擔。因此,我們發展出了一種根據嘌呤與嘧啶進行的全新編碼方式;不但大幅降低了訓練樣本的維度,實驗結果也顯示啟動子預測的精確度仍與傳統正交編碼的預測結果近似。
Escherichia coli K12 was sequenced in 1997. The 4,639,221-base pair DNA sequence consists of 4288 annotated protein-coding genes, 38 percent have no attributed function. For the prediction of prokaryotic promoter, one of the major problems is how to locate the spacers between -35 box and -10 box and between -10 box and transcription start site. In this thesis, locations of promoter regions can be accurately orientated via the adopted expectation maximization (EM) algorithm. And the most representative features are used for training neural networks. On the other hand, most related researches choose a wider range of training sequences directly. But such the workload for both the computation capabilities and demand of memory space are extremely heavy. If our EM extracted features still use traditional orthogonal coding method, the heavy burden of systems cannot be avoidable. Therefore we develop a brand new purine-pyrimidine encoding method. Not only the dimensions of training data can be reduced in large-scale, but also the simulation results of our new coding approach reveal that the precisions of promoter prediction are approximately to the results used traditional orthogonal encoding method.
TABLE OF CONTENTS
1 INTRODUCTION………………….…………………..……..…………...1
2 RPOKARYOTIC PROMOTERS…………………………..…….……... 8
2.1 The Microbe E. coli.………………….………………...……………..8
2.2 DNA Sequence Structure …………..……………….……………… 9
2.3 Characteristics of Prokaryotic Promoters..………………………..11
3 PREDICTION OF WHOLE GENOME SEQUENCE VIA FUZZY NEURAL NETWORKS ………………………………………………… 18
3.1 Data Preprocessing and Encoding …………………………………18
3.2 Fuzzy Neural Networks…………………………………….……….19
3.2.1 Architecture…………………………………………………......20
3.2.2 An On-line Learning Algorithm…………...…………………..23
3.2.2.1 The Structure Learning Algorithm ……………………..23
3.2.2.2 The Parameter Learning Algorithm………..……… …..25
3.3 Simulation Results and Discussions………………………………...29
4 PREDICTION OF (EM) EXTRACTED SEQUENCE VIA NEURAL NETWORKS……………………...………………………………………32
4.1 Data Preprocessing and Encoding……………………………...…..32
4.2 The Expectation Maximization Algorithm……………………...…33
4.2.1 The MLE Problem for the Incomplete Data Set……...…….. 33
4.2.2 The Basic EM Algorithm………………………………………37

4.2.3 Spacers Locating via EM Algorithm………………………….41
4.3 Neural Networks……………………………………………………. 48
4.3.1 Multilayer Feed-forward Networks………………………….. 48
4.3.1.1 Architectures…………………………………………… 48
4.3.1.2 Backpropagation Learning Algorithms………………. 50
4.3.1.3 Levenberg-Marquardt Learning Algorithm…………. 53
4.3.1.4 Conjugate Gradient Learning Algorithm……...……... 55
4.3.2 Learning Vector Quantization Networks…………………….. 57
4.4 Simulation Results and Discussions……………………………….. 58
5 PREDICTION OF PURINE-PYRIMIDINE ENCODED SEQUENCE VIA NEURAL NETWORKS……………………………………….…....60
5.1 Purine-Pyrimidine Encoding Approach……………………………60
5.2 Simulation Results and Discussions……………………………...... 62
6 CONCLUSION AND FUTURE WORKS………………………………63
BIBLIOGRAPHY……………………………………………………………..65

LIST OF TABLES

Table 2.1 The sequenced genomes of E. coli ………………………………....9
Table 2.2 Sigma factors and their contact sequences of E. coli ……………12
Table 4.1 Simulation results for the 150-dimension data set ………………59
Table 5.1 Pseudo codebook for r=2 …………………………………………61
Table 5.2 Simulation results for the 9-dimension data set …………………62




LIST OF FIGURES

Figure 2.1 Promoters are important elements for gene expression ……….10
Figure 2.2 Transcription/translation of a typical prokaryotic gene ………11
Figure 2.3 The typical prokaryotic promoter ………………………………13
Figure 2.4 Distribution of nucleotides around transcription start sites
(position 51) of 115 E. coli promoter sequences ……….……...14
Figure 3.1 Architecture of FNN ……………………………………………. 22
Figure 3.2 The learning algorithm of FNN …………………………………28
Figure 3.3 Prediction results of FNN ………………………………………. 29
Figure 3.4 Prediction errors: three-layer neural network vs. FNN ……… 30
Figure 4.1 The EM algorithm for E. coli spacers locating …………………47
Figure 4.2 The multilayer feedforward network …………………………...49
Figure 5.1 The process of encoding methods ……………………………….61
[1] Genome Online Database, http://wit.integratedgenomics.com/GOLD/
[2] F.R. Blattner, G. Plunkett, C.A. Bloch, N.T. Perna, V. Burland, M. Riley, J.
Collado-Vides, J.D. Glasner, C.K. Rode, G.F. Mayhew, J. Gregor, N.W. Davis, H.A. Kirkpatrick, M.A. Goeden, D.J. Rose, B. Mau, and Y. Shao, "The complete genome sequence of Escherichia coli K-12," Science, vol. 277, no. 5331, pp.1453-1474, 1997.
[3] J. Abello, P.M. Pardalos, and M.G.C. Resende, Handbook of Massive Data
Sets, pp. 1141-1168, Dordrecht: Kluwer Academic, 2001.
[4] C.T. Lin, and C.S. Lee, Neural Fuzzy Systems – a Neuro-Fuzzy Synergism to
Intelligent Systems, Singapore: Prentice-Hall, 1999.
[5] A.G. Pederson, and J. Engelbrect, “Investigations of Escherichia coli
promoter sequences with artificial neural networks: new signals discovered upstream of the transcriptional startpoint,” in Proceedings of 3rd International Conference on Intelligent Systems for Molecular Biology, pp. 292-299, 1995.
[6] S. Handley, “Predicting whether or not a nucleic acid sequence is an E. coli
promoter region using genetic programming,” in Proceedings of 1st
International Symposium on Intelligence in Neural and Biological Systems, 1995.
[7] H. Hirsh, and M. Noordewier, “Using background knowledge to improve
inductive learning of DNA sequences,” in Proceedings of IEEE Conference on Artificial Intelligence for Applications, 1994.
[8] C.H. Wu, “Artificial neural networks for molecular sequence analysis,”
Computers and Chemistry, vol. 21, no. 4, pp. 237-256, 1997.
[9] G.D. Stormo, T.D. Schneider, and L. Gold, “Use of the perceptron algorithm
to distinguish translational initiation sites in E. coli,” Nucleic Acid Research, vol. 10, pp. 2997-3011, 1982.
[10] J. Dayhoff, Neural Network Architectures: an Introduction, New York: Van
Nostrand Reinhold, 1990.
[11] EMBL, abbr. of the European Molecular Biology Laboratory,
http://www.ebi.ac.uk/embl/
[12] NCBI, abbr. of the National Center for Biotechnology Information,
http://www.ncbi.nlm.nih.gov/
[13] DDBJ, abbr. of DNA Data Bank of Japan, http://gib.genes.nig.ac.jp/
[14] M.J. Chamberlin, “The selectivity of transcription,” Annu. Rev. Biochem,
vol. 43, no. 0, pp. 721-775, 1974.
[15] D. K. Hawley, and W.R. McClure, “Mechanism of activation of
transcription initiation from the lambda PRM promoter,” J. Mol. Biol., vol.
157, no. 3, pp. 493-525, 1982.
[16] D.K. Hawley, and W.R. McClure, “The effect of a lambda repressor
mutation on the activation of transcription initiation from the lambda PRM promoter,” Cell, vol. 32, no.2, pp. 327-333, Feb. 1983.
[17] M. Rosenberg, and D. Court, “Regulatory sequences involved in the
promotion and termination of RNA transcription,” Annu. Rev. Genet., vol.
13, pp. 319-353, 1979.
[18] U. Siebenlist, R.B. Simpson, and W. Gilbert, “E.coli RNA polymerase
interacts homologously with two different promoters,” Cell, vol. 20, no. 2,
pp.269-281, Jun. 1980.
[19] T.D. Schneider and R.M. Stephens, “Sequence logos: A new way to display
consensus sequences,” Nucl. Acids Res., vol. 18, no. 20, pp. 6097-6100,
1990.
[20] J.T. Newlands, C.A. Josaitis, W. Ross, and R.L. Gourse, “Both
fis-dependent and factor-independent upstream activation of the rrnB P1
promoter are face of the helix dependent,” Nucleic Acids Res., vol. 20, no. 4, pp. 719-726, 1992.
[21] W. Ross, K.K. Gosink, J. Salomon, K. Igarashi, C. Zou, A. Ishihama, K.
Severinov. and R.L. Gourse, “A third recognition element in bacterial promoters DNA binding by the alpha subunit of RNA polymerase,” Science, vol. 262, no. 5138, pp. 1407-1413, 1993.
[22] S. Busby, and R.H. Ebright, “Promoter structure, promoter recognition and
transcription activation in prokaryotes,” Cell, vol. 79, no. 5, pp. 743-746, 1994.
[23] O.N. Ozoline, and M.A. Tsyganov, “Structure of open promoter complexes
with Escherichia coli RNA polymerase as revealed by the DNase I footprinting technique: compilation analysis,” Nucleic Acids Res., vol. 23, no. 22, pp. 4533-4541, 1995.
[24] B. Lewin, Genes VI, London: Oxford University Press. Pp. 287-332, 1997.
[25] P. Szoke, T.L. Allen and P.L. deHaseth, “Promoter recognition by
Escherichia coli RNA polymerase: effects of base substitutions in the -10 and -35 regions,” Biochemistry, vol. 26, no. 19, pp. 6188-6194, 1987.
[26] M. Kobayashi, K.Nagata, and A. Ishihama, “Promoter selectivity of
Escherichia coli RNA polymerase: effect of base substitutions in the promoter -35 region on promoter strength,” Nucleic Acids Res., vol. 18, no. 24, pp. 7367-7372, 1990.
[27] U. Deuschle, W. Kammerer, , R. Gentz, and H. Bujard, “Promoters of
Escherichia coli: a hierarchy of in vivo strength indicates alternate
structures,” EMBO J., vol. 5, no. 11, pp. 2987-2994, 1986.
[28] R. Keilty, and M. Rosenberg, “Constitutive function of a positively
regulated promoter reveals new sequences essential for activity,” J. Biol. Chem., vol. 262, no. 13, pp. 6389-6395, 1987.
[29] T. Belyaeva, L. Griffiths, S. Minchin, J. Cole, and S. Busby, “The
Escherichia coli cysG promoter, the 'extended -10' class of bacterial
promoters,” Biochem. J., vol. 296, no. 3, pp. 851-857, 1993.
[30] S. Lisser, and H. Margalit, “Compilation of E. coli mRNA promoter
sequences,” Nucleic Acids Res., vol. 21, no. 7, pp. 1507-1516, 1993.
[31] O.N. Ozoline, A.A. Deev, and M.V. Arkhipova, “Non-canonical sequence
elements in the promoter structure. Cluster analysis of promoters recognized by E. coli RNA polymerase,” Nucleic Acids Re., vol. 25, no. 33, pp. 4703-4709, 1997.
[32] I. Mahadevan and I. Ghosh, “Analysis of E. coli promoter structures using
neural networks,” Nucleic Acids Res., vol. 22, no. 11, pp. 2158-2165, 1994.
[33] R. Hershberg, G. Bejerano, A. Santos-Zavaleta, and H. Margalit, “PromEC:
an updated database of Escherichia coli mRNA promoters with
experimentally identified transcriptional start sites,” Nucleic Acids Res., vol. 29, no. 1, 2001.
[34] T. K. Moon, “The expectation-maximization algorithm,” IEEE Signal.
Proce. Mag., Nov. 1996.
[35] S. Theodoridis and K. Koutroumbas, Pattern Recognition, San Diego:
Academic Press, pp. 28-38, 1999.
[36] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum-likelihood from
incomplete data via the em algorithm. J. Royal Statist. Soc. Ser. B., vol. 39,
1977.
[37] R. Redner and H. Walker. Mixture densities, maximum likelihood and the
EM algorithm. SIAM Review, vol. 26, no. 2, 1984.
[38] Z. Ghahramami and M. Jordan, “Learning from incomplete data,”
Technical Report AI Lab Memo No. 1509, CBCL Paper No. 108, MIT AI Lab, August 1995.
[39] M. Jordan and R. Jacobs, “Hierarchical mixtures of experts and the EM
algorithm,” Neural Computation, vol. 6, pp. 181–214, 1994.
[40] C. Bishop. Neural Networks for Pattern Recognition. Oxford: Clarendon
Press, 1995.
[41] C. F. J. Wu. “On the convergence properties of the EM algorithm,” The Annals of Statistics, vol. 11, no. 1, pp. 95–103, 1983.
[42] R. Ash, Information Theory, New York: Interscience, 1965
[43] M. T. Hagan, and M. Menhaj, "Training feedforward networks with the
Marquardt algorithm," IEEE Trans. Neural Networks, vol. 5, no. 6, pp.
989-993, 1994.
[44] F. M. Ham, and I. Kostanic, Principles of neurocomputing for science and
engineering, Boston: McGraw-Hill, 2001.
[45] T. Kohonen, Self-Organization and Associative Memory, 2nd Edition,
Berlin: Springer-Verlag, 1987.
[46] C.C. Peng and C.J. Lin, “E. coli Promoter Prediction Using Neural Fuzzy
Networks,” Proc. of 10th National Conference on Fuzzy Theory and Its
Applications, Hsinchu, Taiwan, 2002.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊