跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.83) 您好!臺灣時間:2024/12/09 14:52
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:葉正聖
研究生(外文):Jeng-Sheng Yeh
論文名稱:三維蛋白質模型檢索:基於功能口袋之建構與比對
論文名稱(外文):3D Protein Retrieval Based on Pocket Modeling and Matching
指導教授:歐陽明歐陽明引用關係
指導教授(外文):Ming Ouhyoung
學位類別:博士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:英文
論文頁數:109
中文關鍵詞:電腦圖學三維模型檢索蛋白質檢索蛋白質功能區域蛋白質功能口袋生物資訊生物圖學多視角 Zernike momentsSpin-images
外文關鍵詞:computer graphics3D model retrievalprotein retrievalprotein function sitesminimal binding surfacebioinformaticsbio graphicsmulti-view Zernike momentsspin-images
相關次數:
  • 被引用被引用:0
  • 點閱點閱:255
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
中文摘要
本論文提出了一個可以對三維蛋白質結構做部分比對的架構。我們使用這個架構來實作出一個三維蛋白質檢索系統來篩選相似蛋白質。在這個系統中,對於未知功能的蛋白質結構,我們的系統有可能經由找出已知功能的相似蛋白質結構們,來提供可能的功能建議,或是可能的藥物結合資訊。本系統可以當成一個前端的篩子來過濾掉大部分不可能的蛋白質而提供要再更進一步精確比對時的建議。
我們的系統流程分為蛋白質功能口袋建構、比對、改進結果這三個階段來串接起來。因為我們了解到蛋白質的功能口袋 (binding pockets) 是在蛋白質作用時很重要的部分,所以我們第一步是試著將蛋白質的凹陷口袋找出來。我們使用一個給定半徑虛擬的圓球,在蛋白質的表面上全部滾過。如果這個虛擬的球內所含蛋白質表面的原子個數夠多的話,便暗示著這個在表面上的虛擬圓球很可能深埋在蛋白質裡面,這剛好是口袋的一個性質。我們便將這個球所在的圓心視為可能功能口袋的候選人。
在建構功能口袋之後,下一步便是做比對的部分。我們實作了兩種不同的比對方式:Multi-View Zernike Moments 及 Spin Images。Multi-View Zernike Moments 可以經由在很多不同的視角下面比對視覺上的相似度,比對找出在整體上的相似形狀。Spin Images 則是使用不同的表面參考點及其對映切平面來當參考座標系統,可以不受任意空間旋轉角度的影響,達成小區域的表面比對。這兩個比對方式都能在不同旋轉角度下實現三維形狀的比對,可用來幫助我們比對三維蛋白質結構。所以,當我們想要預測一個未知功能的三維蛋白質結構的功能時,我們可以在找出它的功能口袋後,利用上述兩種不同的比對方法來從已知資料庫中找出相似形狀的蛋白質。
我們的方法能夠在合理的時間內比對出相似的三維蛋白質、受體或是可結合之化學分子,所以可以幫助生化或生物領域的專家們能夠利用我們的方法當成前置篩選過濾工具,可再和其他工具結合,經由有用的形狀相似資訊來做未知蛋白質功能預測、可能功能區域及藥物結合建議等。
成果部分,首先我們有一個包含所有 PDB 和 FSSP 的三維蛋白質檢索系統。在方便使用的網頁介面下,我們使用多角度視覺相似度的比對方式來比對蛋白質結構。結果顯示在少於三秒的時間下可以在網路上回傳結果,而且每次檢索的正確率約有九成。
其次,關於另外一個比較困難的問題,是要找出可能的受體結合位置 (receptor sites) 及其對映的結合化學分子 (ligands) 或抑制物 (inhibitors)。我們的系統有一些比較初步的成果,包括 (a) 在超過一百個不同蛋白質可以比對下,經過約七十分鐘,可以找出與查詢receptor site 相似蛋白質的receptor sites。 (b) 在二十組 inhibitors/ligands 中,經過 17 分鐘,我們可能夠找到可能的結合位置,平均每組需要花 50 秒鐘。其中機器使用 Pentium IV 2.4 GHz PC。關於正確率部分,前面 (a)的比較在107個蛋白質候選人中 (i) 若只看排名最前面的蛋白質,正確率可達到68%;(ii) 若看排名前五名的蛋白質,正確率可達到 95%。再來 (b)的比較只有先做小案例探論,還需要做更進一步的探討。
ABSTRACT
A framework for matching the partial surface data of three-dimensional (3D) protein structures is proposed. We use this framework to build a retrieval system for 3D structure of proteins. With this system as a filter, suggestions for its functions or corresponding binding drugs can be provided with the known proteins of similar shapes in our database as a front-end filter to reduce the search space for more accurate search by other methods.
The pipeline of our system has three stages: pocket modeling, matching, and refinement. First we extract the possible binding pockets of proteins and model them since the binding pockets are the active sites in protein-protein or protein-ligand interaction. We use the “Sphere Coverage” method to retrieve the binding pockets, that is, we use a virtual sphere to first roll along the solvent-accessible surface, and then if there is more than 50 percent of space filled by atoms of proteins, it suggests that the virtual spheres should be nearby the concave parts of proteins.
Furthermore, after constructing the 3D models of binding pockets, we implemented two algorithms for matching: the multi-view Zernike moments and the spin images. The multi-view Zernike moments can match the global shape by visual similarity in many different viewing directions. The spin images can find local surface features in rotation invariant way with respect to a reference point and its corresponding surface normal. All those two rotation invariant methods can help our matching of 3D protein data. In our experiments, given an unknown 3D protein, by extracting and modeling its possible binding pockets, we can use the above two methods to retrieve similar proteins from the database.
Since our method can match two 3D proteins, their receptors and ligands in a reasonably short time as a preliminary filter, it will benefit biochemists and biologists with very useful information in function prediction, in terms of possible functional sites of unknown proteins or suggestions for drug binding.
First, a web-based 3D protein retrieval system is available for protein structure data including all PDB and FSSP database. In this system, we use a visual-based matching method to compare the protein structure from multiple viewpoints. It takes less than three seconds for each query with 90 percents accuracy on an average.
Secondly, for the more difficult problem of finding possible receptor sites and its corresponding inhibitors, our system has the preliminary results using a 2.4 G Hz Pentium IV PC that (a) within 70 minutes, a query receptor site can be used to retrieve possible proteins that also have similar receptor sites from 107 different proteins. (b) Within 17 minutes, a given receptor site used as a query can retrieve a possible inhibitor/ligand that may fit into this given receptor site, out of 20 possible inhibitors/ligands, where each receptor/ligand pair takes about 50 seconds to compute. The rate of precision for experiment (a) above with a database of 107 candidates is (i) 68% for the top rank retrieved results, and (ii) 95% for top five ranked retrieved results, that is, the correct answer is one of the top five candidates. For experiment (b) above, only case studies are done, and formal experiments need to be conducted yet.
ABSTRACT 1
TABLE OF CONTENTS 3
LIST OF FIGURES 5
CHAPTER 1 INTRODUCTION 11
1.1 THESIS STATEMENT 11
1.2 MOTIVATION 11
1.3 LEVELS OF DIFFICULTY IN MOLECULAR BINDING 21
1.4 ORGANIZATION 22
CHAPTER 2 PROTEIN POCKET MODELING 23
2.1 PROTEINS 23
2.2 SOLVENT ACCESSIBLE SURFACE 27
2.3 SPHERE COVERAGE 30
CHAPTER 3 POCKET MATCHING BY MULTI-VIEW ZERNIKE MOMENTS 33
3.1 INTRODUCTION 34
3.2 METHODS 35
3.3 DEFINITIONS OF THE MULTI-VIEW DESCRIPTORS AND FEATURES 37
3.4 EVALUATION METHODS 39
CHAPTER 4 POCKET MATCHING BY SPIN-IMAGES 43
4.1 INTRODUCTION 43
4.2 OUR EXTENSION BY OPENGL PROJECTION MATRIX AND RASTERIZATION TECHNOLOGIES 50
4.3 THE PURE GPU ALGORITHM WITH VERTEX SHADER 61
4.4 THE PARTIAL MATCHING PROBLEM 65
4.5 THE SPIN-IMAGES COMPARISON BY G-FUNCTION 66
CHAPTER 5 PROBLEMS, EXPERIMENTS, AND RESULTS 67
5.1 INTRODUCTION OF PROBLEMS 67
5.2 THE EXPERIMENTS AND RESULTS FOR PROBLEM 1 68
5.3 THE EXPERIMENTS AND RESULTS FOR PROBLEM 2 (A) 77
5.4 THE EXPERIMENTS AND RESULTS FOR PROBLEM 2 (B) 91
CHAPTER 6 CONCLUSIONS AND FUTURE WORKS 101
6.1 CONCLUSIONS 101
6.2 FUTURE WORKS 102
BIBLIOGRAPHY 103
PUBLICATION LIST OF JENG-SHENG YEH 109
[Allen 2002] Allen F. H.: The cambridge structural database: a quarter of a million crystal structures and rising. Acta Crystallography B58 (2002), 380-388.
[Anand 2003] Anand K., Ziebuhr J., Wadhwani P., Mesters J. R., Hilgenfeld R.: Coronavirus main proteinase 3CLpro structure: Basis for design of anti-sars drugs. Science 300, 5626 (2003), 1763-1767.
[Ankerst 1999] Ankerst M., Kastenuller G., Kriegel H.-P., Seidl T.: Nearest neighbor classication in 3d protein databases. In Proc. ISMB 99 (1999), 34-43.
[Bairoch 1993] Bairoch A., Beockman B.: The SWISS-PROT protein sequence data bank, recent developments. Nucleic Acids Research 21, 13 (1993), 3093-3096.
[Bernstein 1977] Bernstein F. C., Koetzle T. F., Williams G. J., Meyer E. F., Brice M. D., Rodgers J. R., Kennard O., Shimanouchi T., Tasumi M.: The protein data bank: A computerbased archival File for macromolecular structures. J. Molecular Biology 112 (1977), 535-542.
[Blankenbecler 2003] Blankenbecler R., Ohisson M., Peterson C., Ringner M.: Matching protein structures with fuzzy alignments. Proc. the National Academy of Sciences of the United States of America 100, 21 (2003), 11936-11940.
[Bourke 1993] Bourke P.: Discrete Fourier Transform, June 1993. http://astronomy.swin.edu.au/~pbourke/analysis/dft/
[Bragg 1975] Bragg S L.: The development of X-ray analysis. Dover, New York NY (Dover edition published 1992).
[Branden 1999] Branden C.-I., Tooze J.: Introduction to Protein Structure, 2nd ed. Garland, 1999.
[Brandy 2000] Brady Jr. G. P., Stouten P. F. W.: Fast prediction and visualization of protein binding pockets with PASS. J. Computer-Aided Molecular Design 14, 4 (2000), 383-401.
[Chang D.T.H. 2004] Chang D.T.-H, Chen C.-Y., Oyang Y.-J., Juan H.-F. Huang, H. C.: ProteMiner-SSM: a web server for efficient analysis of similar protein tetiary substructure. Nucleic Acids Research. 32. Web Server Issue. W76-W82.
[Chang P.-K. 2004] Chang P.-K., Chen C.-C., Ouhyoung M.: A tool for structure alignment of molecules, Proc. of IEEE Sixth International Symposium on Multimedia Software Engineering (IEEE-MSE2004) Special Session on Bioinformatics (2004), 354-361.
[Chao 1999] Chao K.-M.: Calign: Aligning Sequences with Restricted Affine Gap Penalties, Bioinformatics, 15:4 (1999), 298-304.
[Chen C.-C. 1998] Chen C.-C., Wang L.-H., Kao C.-Y., Chen W.-C., Ouhyoung M.: Molecular binding in structure-based drug design: a case study of the population-based annealing genetic algorithms, Tenth International Conference on Tools with Artificial Intelligence (1998), 328-335.
[Chen C.-C. 2004] Chen C.-C.: On 3D substructure matching of molecules and its applications in protein function classification. Ph.D. dissertation, Dept. Computer Science and Information Engineering, National Taiwan Univesity, 2004.
[Chen C.-Y. 2003] Chen C.-Y.: Incremental Hierarchical Clustering Algorithms Based on Statistical Models. Ph.D. dissertation, Dept. Computer Science and Information Engineering, National Taiwan University, 2003.
[Chen D.-Y. 2003a] Chen D.-Y.: Three-dimensional model shape description and retrieval based on LightField descriptors. Ph.D. dissertation, Dept. Computer Science and Information Engineering, National Taiwan University, 2003.
[Chen D.-Y. 2003b] Chen D.-Y., Tian X.-P., Shen Y.-T. Ouhyoung M.: On visual similarity based 3D model retrieval, Computer Graphics Forum (EUROGRAPHICS''03), 22, 3 (2003), 223-232.
[Chen R. 2003] Chen R., Li L., Weng Z.: ZDOCK: An initial-stage protein-docking algorithm. Proteins: Structure, Function, and Genetics, 52, 1 (2003), 80-87.
[Chen S.-C. 2002] Chen S.-C., Chen T.: Protein retrieval by matching 3d surfaces. In Proc. GENSIPS (2002).
[Cieplinski 2001] Cieplinski L., Kim M., Ohm J.-R., Pickering M., Yamada A.: Text of ISO/IEC 15938-3/FCD Information Technology - Multimedia Content Description Interface - Part 3 Visual. ISO/IEC JTC1/SC29/WG11/N4062 March 2001 (Singapore).
[Connolly 1981] Connolly M. L.: Protein surfaces and interiors, Ph.D. dissertation, University of California at Berkeley, 1981.
[Connolly 1983] Connolly M. L.: Analytical molecular surface calculation. Journal of Applied crystallography 16 (1983), 548-558.
[Connolly 1983b] Connolly M. L.: Solvent-accessible surfaces of proteins and nucleic acid. Science 221, 4612 (1983), 709-713.
[Funkhouser 2003] Funkhouser T., Min P., Kazhdan M., Chen J., Halderman A., Dobkin D.: A Search Engine for 3D Models. ACM TOG 22, 1 (2003), 83-105.
[Goodsell 1990] Goodsell D S, Olson A J: Automated Docking of Substrates to Proteins by Simulated Annealing. Proteins: Str. Func. and Genet., 8 (1990) 195-202.
[Hilaga 2001] Hilaga M., Shinagawa Y., Kohmura T., Kunii T. L.: Topology matching for fully automatic similarity estimation of 3d shapes. In Proc. SIGGRAPH 2001 (2001), 201-212.
[Holm 1998] Holm L., Sander C.: Touring protein fold space with Dali/FSSP. Nucleic Acids Research 26, 1 (1998), 316-319.
[Jeannin 2000] Jeannin S., Cieplinski L., Ohm J. R., Kim M.: MPEG-7 Visual part of eXperimentation Model Version 7.0, ISO/IEC JTC1/SC29/WG11/N3521. 2000.
[Johnson 1997] Johnson A. E., Spin-Images: a representation for 3-D surface matching, Ph.D. dissertation, Robotics Institute, Carnegie Mellon University, 1997.
[Johnson 1998] Johnson A. E., Hebert M., Surface matching for object recognition in complex three-dimensional scenes. Image and Vision Computing 16 (1998), 635-651.
[Kendrew 1958] Kendrew J. C., et al., A three dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature, 181:662-6, 1958.
[Langride 1981] Langride R., Ferrin T. E., Kuntz I. D., Connolly M. L.: Real-time color graphics in studies of molecular interactions. Science 211, 4483 (1981), 661-666.
[Laskowski 2001] Laskowski R.A. PDBsum: summaries and analyses of pdb structures. Nucleic Acids Res.,29 (2001), 221-222.
[Lathrop 2003] Lathrop R. H.: The protein threading problem with sequence amino acid interaction preferences is NP-complete. Journal of Molecular Biology 326 (2003), 621-636.
[Lee 1971] Lee B. & Richards F.M.: The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55 (1971), 379-400.
[Li 2003] Li L., Chen R., Wang Z.: RDOCK: Refinement of rigid-body protein docking predictions. Proteins Structure Function and Genetics, 53, 3 (2003), 693-707.
[Morris 1998] Morris G M, Goodsell D S, Halliday RS, Huey R, Hart W E, Belew R K, Olson A J: Automated Docking Using a Lamarckian Genetic Algorithm and and Empirical Binding Free Energy Function. J. Computational Chemistry, 19 (1998) 1639-1662.
[Morris 2005] Morris R. J., Najmanovich R. J., Kahraman A., Thornton J. M.: Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics, 21 10 (1005), 2347-2355.
[Nehab 2004] Nehab D.: http://www.cs.princeton.edu/~diego/academic/phd/597d/spin/
[Norel 1994] Norel R., Fischer D., Wolfson H., Nussinov R.: Molecular surface recognition by a computer vision-based technique. Protein Engeneering 7, 1 (1994), 39-46.
[Norel 1999] Norel R., Petrey D., Wolfson H., Nussinov R.: Shape complementary in docking. PROTEINS: Structure, function, and Genetics 36 (1999) 307-317.
[Oyang 2003] Oyang Y.-J., Chang D.T.-H, Chen C.-Y., Hwang S.-C.: Expediting Protein Structural Analysis with an Efficient Kernel Density Estimation Algorithm, Proceedings of IEEE 5th International Symposium on Multimedia Software Engineering (2003)
[Prilusky 2004] Prilusky J. (2004) OCA, a browser-database for structure/function.
[Richards 1977] Richards F. M.: Areas, volumes, packing, and protein structure. Annu Rev Biophys Bioeng 6 (1977), 151-176.
[Shih 2003] Shih S.C.E. and Hwang M.-J. Protein structure comparison by probability-based matching of secondary structure elements. Bioinformatics, 19. 6 (2003), 735-741.
[Shilane 2004] Shilane P. et al.: The Princeton shape benchmark. In Proceedings of Internation Conference on Shape Modeling and Applications, (2004), Palazzo Ducale, Genova, IEEE Computer Society Press.
[Shindyalov 1998] Shindyalov I. N., Bourne P. E.: Protein structure alignment by incremental combinatorial extension (ce) of the optimal path. Protein Engineering 11, 9 (1998), 739-747.
[Tu 2002] Tu J.-T.: Protein Active Site Prediction By Matching 3D Structural Data, Master thesis, Dept. Computer Science and Information Engineering, National Taiwan University, 2003.
[Waterman 1984] Waterman M. S., Arratia R., Calas D.: Pattern recognition in several sequences: consensus and alignment. Bulletin of Mathematical Biology 46, 4 (1984), 515-527.
[Yang 2004] Yang J.-M., Chen C.-C.: GEMDOCK: A generic evolutionary method for molecular docking,” Proteins: Structure, Function, and Bioinformatics, vol. 55 (2004), 288-304.
[Yeh 2005] Yeh J.-S., Chen D.-Y., Chen B.-Y., Ouhyoung M.: A web-based 3D protein retrieval system by matching visual similarity. Bioinformatics, 21, 13 (2005), 3056-3057.
[Zemla 2003] Zemla A.: LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Research 31, 13 (2003), 3370-3374.
[Zhang 2002] Zhang D., Lu G.: "Comparative study of Fourier descriptors for shape representation and retrieval". Proc. of the Fifth Asian Conference on Computer Vision (2002), 646-651.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top