(18.206.238.77) 您好!臺灣時間:2021/05/18 06:05
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳建成
研究生(外文):Chien-Cheng Chen
論文名稱:分子間三維子結構之比對與其應用於蛋白質功能分類
論文名稱(外文):On 3D Substructure Matching of Molecules and Its Applications in Protein Function Classification
指導教授:歐陽明歐陽明引用關係
指導教授(外文):Ming Ouhyoung
學位類別:博士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:英文
論文頁數:98
中文關鍵詞:活化區域幾何雜湊法疊代式最近點演算法結構比對蛋白質功能分類
外文關鍵詞:structure alignmentprotein function classificationactive sitegeometric hashingIterative Closest Point
相關次數:
  • 被引用被引用:0
  • 點閱點閱:88
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
藉由實驗的技術越來越多的蛋白質可取得其三維結構的資料,例如,著名的蛋白質資料庫PDB (Protein Data Bank)目前已儲存了超過 25,000 個蛋白質的三維結構,而且數量與日俱增。同時,也有很多分析分子結構與視覺化的工具被開發出來。然而,人們所關心的是:這些蛋白質的功能是什麼呢?是否與其它蛋白質具有類似的活化區域呢?蛋白質的功能與活化區域有著高度的相關性,蛋白質纏繞成複雜的三維結構,藉著其活化區域適當地與其他分子交互作用來實現它的功能。因此,如何藉由蛋白質的三維結構,進而得知蛋白質的功能及活化區域的位置,已變成一個研究上重要而受人矚目的問題。而面對這些課題,我們將它轉換成蛋白質三維部份結構比對的問題來解決。
在本篇論文中,我們提出蛋白質三維子結構比對的技術與其應用。首先,我們提出一個兩階段以幾何雜湊 (Geometric Hashing) 為基礎的分子間(不限於蛋白質)相似性比對架構,此第一階段乃運用【幾何雜湊演算法】(Geometric Hashing Algorithm) 在分子間做整體性初步地配對,而第二階段乃運用【疊代式最近點演算法】(Iterative Closest Point Algorithm) 做局部的最佳化微調,直到配對的原子對在某一門檻距離下,不再增加為止。此架構用於分子間的比對,對於蛋白質EFG與非蛋白質EF-tu/tRNA複合物、蛋白質RRF與tRNA等比對的結果都相當好。另外,我們也與目前四個最常用且相互競爭的方法 (Yale, Dali, CE, Lund) 做比較,結果顯示以RMSD和相配對的原子數為標準來衡量,我們的方法是最好的,然而,我們的方法需要較多的計算時間。
此外,運用這個比對架構,我們可以預測蛋白質可能的功能及活化區域,其主要的想法是:兩個蛋白質間即使序列不相似,但是,如果兩者之與功能息息相關的活化區域具有極其相似的三維幾何形狀,則它們仍然可能有類似的功能。我們建立了一個活化區域的資料庫,對一個有興趣知道功能的白質,我們將之與活化區域資料庫中的所有活化區域做相似性比對。我們的方法最終會選出一個在蛋白質中有最相像結構的活化區域,而這個活化區域的原始功能類似,即是我們預測這個蛋白質所應該擁有的功能。此外,蛋白質比對出來的最相像子結構,則是活化區域的可能位置。依據實驗結果顯示,藉著一些鑑別器的輔助,保守地估計酵素分類第四類的正確率約為42.12%,第五類約為79.06%,第六類約為60.78%。此外,我們也估算出比較高的正確率上限值。憑藉著這些研究的成果,我們期望能夠幫助生化學家儘快地測知未知蛋白質的功能,甚至發現已知蛋白質的新功能。


With the explosion of protein sequences and structures storing into databanks, it is highly desirable to explore feasible alignment and classification methods for newly found protein enzymes classified into their respective enzyme classes by means of an automated procedure. This is indeed important because knowing which class an enzyme belongs to may help deduce its catalytic mechanism and specificity, giving clues to relevant biological functions. Traditional experimental methods are both time-consuming and costly. However, these requirements can be met by transforming them into a 3D common substructure matching problem among proteins. In this dissertation, I have proposed a two-pass geometric hashing based framework, the Geometric Hashing and Iterative Closest Point algorithms, to handle the 3D substructure matching problem of proteins. Two techniques, alpha shape and 3D reference frame schemes for feature extraction, are used to reduce the computation cost.
Our methods aligned two molecules (not just proteins) based on 3D structural data. Two main experiments are conducted based on the data from the PDB. The first is to solve the molecular alignment problem, where, for example, the similarity between a protein EFG and a non-protein EF-tu/tRNA complex is calculated. We also compare with four popular tools (Yale, Dali, CE and Lund) with six protein pairs, and the results show that our method improves over the others in terms of RMSD and the number of matched atom pairs. However, our method is computationally more expensive. In the second experiment, which is also an innovative application, active site residues in a protein can be aligned and matched against other proteins with different enzyme classification numbers. With the consideration of atom type and binding surface area as discriminators, this method can reach an accuracy rate of 42.12% for enzymes classified in EC4, and 79.06% for EC5, and over 60.0% for EC6 conservatively, and a higher upper bound for accuracy rate is also evaluated. Both experiments demonstrate that the proposed methods are useful and versatile.


Content
中文摘要 1
ABSTRACT 2
CHAPTER 1. INTRODUCTION 3
1.1 MOTIVATION 5
1.2 THESIS STATEMENT 7
1.3 CONTRIBUTIONS 8
1.4 PROTEIN STRUCTURES AND NUCLEIC ACIDS 9
1.5 ORGANIZATION 13
CHAPTER 2. RELATED WORK 14
2.1 SUBSTRUCTURE IDENTIFICATION OF PROTEINS 15
2.2 SEQUENCE ALIGNMENT 17
2.3 STRUCTURAL ALIGNMENT 19
2.4 PUBLIC DOMAIN ALIGNMENT TOOLS 21
CHAPTER 3. ALIGNMENT OF PROTEINS BASED ON 3D STRUCTURAL DATA 23
3.1 OVERVIEW 23
3.2 GEOMETRIC HASHING ALGORITHM 26
3.3 ALPHA SHAPE ALGORITHM 30
3.4 3D REFERENCE FRAME 32
3.5 ITERATIVE CLOSEST POINT ALGORITHM 35
3.6 THE MOLECULAR ALIGNMENT PROBLEM 37
CHAPTER 4. 3D SUBSTRUCTURE MATCHING OF PROTEINS 45
4.1 OVERVIEW 45
4.2 ACTIVE SITE RETRIEVAL 50
4.3 PROTEIN FUNCTION CLASSIFICATION AND PREDICTION 54
4.4 ONE WAY TO IMPROVE THE ACCURACY : THE MINIMAL BINDING SURFACE PROBLEM 62
4.5 DISCRIMINATORS FOR PROTEIN FUNCTION CLASSIFICATION AND PREDICTION 67
CHAPTER 5. CONCLUSION AND FUTURE WORK 73
5.1 CONCLUSION 73
5.2 FUTURE WORK 75
REFERENCES 76
APPENDICES 82
A. THE EXPERIMENTAL DATA OF PROTEIN FUNCTION CLASSIFICATION AND PREDICTION 82
A.1 A Summary of the Numbers of Site Structure and Test Protein 82
A.2 A Detailed List of Test Proteins 86
A.3 A Detailed List of Site Database 91



[ALTS90]Altschul,S., Gish,W., Miller,W., Myers,E.W., Lipman,D.J., Basic local alignment search tool, J. Mol. Biol., 215, 403–410, 1990.
[ANAD03]Anand,K., Ziebuhr,J., Wadhwani,P., Mesters, JR., and Hilgenfeld,R., Coronavirus main proteinase (3CLpro) structure: basis for design of anti-SARS drugs, Science, 300, 1763-1767, 2003.
[ARTY97]Artymiuk,P.J., Poirrette,A.R., Rice,D.W. and Willett,P., A polymerase I palm in adenylyl yclase, Nature, 388, 33-34, 1997.
[BAIR97]Bairoch,A. and Apweiler,R., The SWISS-PROT protein sequence data bank and its supplement TrEMBL, Nucleic Acids Research, 25, 31-36, 1997. http://www.expasy.ch/
[BESL92]Besl,P.J., and McKay,N.D., A method for registration of 3d shapes, IEEE Trans. on Pattern Recognition and Machine Intelligence, 14, 239–256, 1992.
[BLAN03]Blankenbecler,R., Ohlsson,M., Peterson,C., and Ringner,M., Matching protein structures with fuzzy alignments, In PNAS, 100, 11936–11940, 2003.
[BRAD00]Brady,G.P., and Stouten,P.F.W., Fast prediction and visualization of protein binding pockets with PASS, J. Comput Aided Mol. Design, 14, 383–401, 2000.
[BRYA97]Bryant,S.H., Madej,T., Janin,J., Liu,Y., Ruoho,A., Zhang,G. and Hurley,J.H., A polymerase I palm in adenylyl cyclase A reply, Nature, 388, 34, 1997.
[CAI04]Cai,C.Z., Han,L.Y., Ji,Z.L., and Chen,Y.Z., Enzyme family classification by support vector machines, Proteins: Structure, Function and Bioinformatics, 55, 66-76, 2004.
[CAN03a]Can,T., Wang,Y., Wang,Y.F., and Su,J., FPV: Fast protein visualization using Java 3D, Bioinformatics, 19, 0-10, 2003.
[CAN03b]Can,T., and Wang,Y.F., CTSS: A robust and efficient method for protein structure alignment based on local geometrical and biological features, Proceedings of the IEEE Computer Society Bioinformatics Conference Palo Alto, CA, 2003.
[CHAK99]Chakraborty,S. and Biswas,S., Approximation algorithms for 3D common substructure identification in drug and protein molecules, In Proc. 6th. International Workshop on Algorithms and Data Structures, LNCS 1663, pages 253-264, 1999.
[CHAN02]Chang,B.C., and Halgamuge,S.K., Protein motif extraction with neuro-fuzzy optimization, Bioinformatics, 18, 1084–1090, 2002.
[CHAN04]Chang,D.T.H., Chen,C.Y., Oyang,Y.J., Juan,H.F. and Huang,H.C., ProteMiner-SSM: A web server for efficient analysis of similar protein tertiary substructures, Nucleic Acids Research, Vol. 32, Web Server Issue, July 2004.
[CHEN02a]Chen,S.C. and Chen,T., Retrieval of 3D protein structure, IEEE International Conference on Image Processing, 2002
[CHEN02b]Chen,S.C. and Chen,T., Protein retrieval by matching 3D surfaces, In Genomic Signal Processing and Statistics (GENSIPS), Raleigh, North Carolina, USA., 2002.
[CHEN03]Chen,D.Y., Tlan,X.P., Shen,Y.T., and Ouhyoung,M., On visual similarity based 3D model retrieval, Computer Graphics Forum, 22, 3, 223–232. Also in EuroGraphics2003, Spain, 2003.
[CHEN04]Chen,C.C., Tu,J.T., Chang,P.K., Chen,B.Y., Liang,R.H. and Ouhyoung,M., Protein function prediction by matching 3D structural data, Proc. of NICOGRAPH International 2004, p.113-p.120, Hsinchu, Taiwan, 2004
[CHEW99]Chew,L.P., Kedem,K., Kleinberg,J. and Huttenlocher,D.P, Fast detection of common geometric substructure in proteins, In Proc. RECOMB’99 – 3rd. Annual International Conference on Computational Molecular Biology, April 1999.
[CHOU03]Chou,K.C. and Elrod,D.W., Prediction of enzyme family classes, Journal of Proteome Research, 2, 183-190, 2003.
[COHE77]Cohen,S.S., A Strategy for the Chemotheraphy of Infectious Disease, Science, Vol. 197, No. 4302, 431-432, 1977.
[CONN83]Connolly, M.L., Analytical molecular surface calculation, J. Appl. Cryst., 16, 548-558, 1983.
[DELC75]Delcoigne,A., Hansen,P., Sequence comparison by dynamic programming, Biometrika, 62, 661–664., 1975.
[DENG02]Deng,M., Zhang,K., Mehta,S., Chen,T., and Sun,F., Prediction of protein function using protein-protein interaction data, In IEEE Computer Society Bioinformatics Conference, 197-206, 2002.
[DESJ97]DesJardins,M., Karp,PD., Krummenacker,M., Lee,TJ., and Ouzounis,CA., Prediction of enzyme classification from protein sequence without the use of sequence similarity, ISMB, 92-98, 1997.
[EDDY96]Eddy,S., Hidden Markov models, Curr. Opin. Struct. Biol., 6, 361-365, 1996.
[EDEL94]Edelsbrunner,H. and Mucke,E.P., Three-dimensional alpha shapes, ACM Trans. Graphics, 13, 43-72, 1994. http://www.alphashapes.org/alpha/index.html
[ENZY92]Enzyme Nomenclature, NC-IUBMB, Academic Press, New-York, 1992. http://www.biochem.ucl.ac.uk/bsm/enzymes/
[FISC92]Fischer,D., Bachar,O., Nussinov,R. and Wolfson,H., An efficient automated computer vision based technique for detection of three dimensional structural motifs in proteins, J. Biomol. Struct. Dynam., 9, 769–789, 1992.
[FISC96]Fischer,D., Eisenberg,D., Protein fold recognition using sequence derived predictions, Prot. Sci, 5:947–955, 1996.
[FUNK03]Funkhouser,T., Min,P., Kazhdan,M., Chen,J., Halderman,A., Dobkin,D., and Jacobs,D., A search engine for 3D models, ACM TOG, 22, 1, 83–105, 2003.
[GERS96]Gerstein,M. and Levitt,M., Using iterative dynamic programming to obtain accurate pair-wise and multiple alignments of protein structures, In Proc. Fourth Int. Conf. On Intell. Sys. For Mol. Biol., Menlo Park, CA: AAAI Press. Pp 59-67, 1996.
[HOLM96]Holm,L. and Sander,C., Mapping the protein universe, Science, 273, 595-602, 1996.
[HOLM98]Holm,L., and Sander,C., Touring protein fold space with Dali/FSSP, Nucl. Acids Res. 26, 316-319, 1998. http://www2.ebi.ac.uk/dali/
[HUAN03]Huang,X. and Chao,K.M., A generalized global alignment algorithm, Bioinformatics, 19, 228-233, 2003.
[JENS02]Jensen,LJ, Gupta,R., Blom,N., Devos,D., Tamames,J., Kesmir,C., Nielsen,H., Staerfeldt,HH, Rapacki, K., Workman,C., Andersen,CA, Knudsen,S., Krogh,A., Valencia,A., and Brunak,S., Prediction of human protein function from post-translational modifications and localization features, J. Mol. Biol., 319, 1257-65, 2002.
[LAMD88]Lamdan,Y., Wolfson,H.J., Geometric hashing: A general and efficient model-based recognition scheme, In Proc. of second ICCV, 238-289, 1988.
[LATH94]Lathrop,R.H., The protein threading problem with sequence amino acid interaction preferences is NP-complete, Protein Engineering, 7, 1059–1068, 1994.
[LIAN98]Liang,J., Edelsbrunner,H., Fu,P., Sudharkar,P.V. and Subramaniam,S., Analytic shape computation of macromolecules II: inaccessible cavities in proteins, Proteins: Structure, Function, and Genetics, 33, 18-29, 1998.
[LIAN04]Liang,H., and Landweber,L.F., Computational tests of molecular mimicry between tRNA and protein translation factors, submitted, 2004.
[MILI03]Milik,M., Szalma,S., and Olszewski,K.A., Common structural cliques: a tool for protein structure and function analysis, Protein Engineering, 16, 8, 543–552, 2003.
[MURR02]Murray,K.B., Gorse,D., and Thornton,J.M., Wavelet transforms for the characterization and detection of repeating motifs, J. Mol. Biol., 316, 2, 341–363, 2002.
[MURZ95]Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C., SCOP: a structural classification of proteins for the investigation of sequences and structures, J. Mol. Biol., 247, 536-540, 1995.
[NAJM00]Najmanovich,R., Kuttner,J., Sobolev,V., and Edelman,M., Side-chain flexibility in proteins upon ligand binding, Proteins, 39, 3, 261–268, 2000.
[NEED70]Needleman,S.B., and Wunsch,C.D., A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol., 48, 443–453, 1970.
[NISS95]Nissen,P., Kjeldgaard,M., Thirup,S., Polekhina,G., Reshetnikova,L., Clark,B.F. and Nyborg,J., Crystal structure of the ternary complex of Phe-tRNAPhe, EF-Tu, and a GTP analog, Science, 270, 1464-1472, 1995.
[NISS00]Nissen,P., Kjeldgaard,M., and Nyborg,J., Macromolecular mimicry, EMBO J., 19, 489-495, 2000.
[ONDR01]Ondrechen,M.J., Clifton,J.G., and Ringe,D., Thematics: A simple computational predictor of enzyme function from structure, In PNAS, 98, 12473–12478, 2001.
[OREN96]Orengo,C.A. and Taylor,W.R., SSAP: sequential structure alignment program for protein structure comparison, Methods Enzymol., 266, 617-635, 1996.
[OREN97]Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M., CATH—a hierarchic classification of protein domain structures, Structure, 5, 1093-1108, 1997.
[PEAR88]Pearson,W. and Lipman,D. Improved tools for biological sequence comparison, Proc. Natl Acad. Sci., USA, 85, 2444-2448, 1988.
[PENN94]Pennec,X., Ayache,N., An O (n2) algorithm for 3D substructure matching of proteins, In Califano,A., Rigoutsos,I. and Wolson,H.J. editors, Shape and Pattern Matching in Computational Biology- Proc. First Int. Workshop, Plenum Pubishing, 25-40, 1994.
[PENN98]Pennec,X. and Ayache,N., A geometric algorithm to find small but highly similar 3D substructures in proteins, Bioinformatics, 14, 516-522. 1998.
[PROT95]Protein Data Bank, Quarterly Newsletter, No. 71, Brookhaven National laboratory, 1995. http://www.rscb.org/pdb
[RACK80]Rackovsky,S., and Scheraga,H.A., Differential geometry and polymer conformations 2: Development of a conformational distance function, Macromolecules, 13, 1440-1453, 1980.
[ROCK02]Rockey,W.M., and Elcock,A.H., Progress toward virtual screening for drug side effects, Proteins, 48, 4, 664–671, 2002.
[SCHW78]Schwartz,R.M. and Dayhoff,M.O., Atlas of Protein Sequence and Structure, National biomedical research foundation Washington DC, 5, 353-358, 1978.
[SELM99]Selmer,M., Al-Karadaghi,S., Hirokawa,G., Kaji,A. and Liljas,A., Crystal structure of Thermotoga maritima ribosome recycling factor: a tRNA mimic, Science, 286, 2349-2352, 1999.
[SETU97]Setubal,J. and Meidanis,J., Introduction to Computational Molecular Biology, PWS Publishing Company, 1997.
[SHIH03]Shih,E.S.C., and Hwang,M.J., Protein structure comparison by probability-based matching of secondary structure elements, Bioinformatics, 19, 735-741, 2003.
[SHIN98]Shindyalov,I.N., and Bourne,P.E., Protein structure alignment by incremental combinatorial extension (CE) of the optimal path, Protein Engineering, 11, 9, 739–747, 1998. http://cl.sdsc.edu/ce.html
[SING97]Singh,A.P. and Brutlag,D.L., Hierarchical protein structure superposition using both secondary structure and atomic representations, In Proc. Fifth Int. Conf. On Intell. Sys. For Mol. Biol., Menlo Park, CA: AAAI Press. Pp 284-293, 1997.
[SING98]Singh,A.P. and Brutlag,D.L., Protein structure alignment: A comparison of methods., 1998.
[SMIT81]Smith,T.F., and Waterman,M.S., Identification of common molecular subsequences, J. Mol. Biol., 147:195–197, 1981.
[TIAN03]Tian,W. and Skolnick,J., How well is enzyme function conserved as a function of pair wise sequence identity?, J. Mol. Biol., 333, 863–882, 2003.
[TSAI96]Tsai,C.J., Lin,S.L., Wolfson,H.J., and Nussinov,R., Techniques for searching for structural similarities between protein cores, protein surfaces and between protein-protein interfaces, Techniques in Protein Chemistry VII, 419–429, 1996.
[WALL97]Wallace,A.C., Borkakoti,N. and Thornton,J.M., TESS: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases: Application to enzyme active sites, Protein Science, 6:2308-2323, 1997.
[WATE84]Waterman,M.S., Arratia,R., and Galas,D.J., Pattern recognition in several sequences: consensus and alignment, Bulletin of Mathematical Biology, 46, 4, 515–527, 1984
[WOLF97]Wolfson,H.J., and Rigoutsos,I., Geometric hashing: An overview, IEEE Computational Science and Engineering, 4, 10–21, 1997.
[ZEML03]Zemla,A., Lga: A method for finding 3D similarities in protein structures, Nucleic Acids Research, 31, 13, 3370–3374, 2003.
[ZHAN94]Zhang,Z., Iterative point matching for registration of free-form curves and surfaces, Int. J. Comput. Vision, 13, 2, 119-152, 1994.



QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top