跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.169) 您好!臺灣時間:2025/02/18 20:29
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:董其樺
研究生(外文):Chi-hua Tung
論文名稱:PiSA-BLAST:快速蛋白質結構比對與資料庫搜尋工具
論文名稱(外文):PiSA-BLAST: A New Tool for Protein Structure Alignment and Database Search
指導教授:楊進木
指導教授(外文):Jinn-moon Yang
學位類別:碩士
校院名稱:國立交通大學
系所名稱:生物資訊研究所
學門:生命科學學門
學類:生物訊息學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:英文
論文頁數:79
中文關鍵詞:蛋白質結構比對結構資料庫搜尋kappa角與alpha角計分陣列即時網頁服務
外文關鍵詞:Protein structure alignmentStructure database searchKappa and alpha angleSubstitution matrixreal-time web services
相關次數:
  • 被引用被引用:0
  • 點閱點閱:393
  • 評分評分:
  • 下載下載:29
  • 收藏至我的研究室書目清單書目收藏:0
近年來隨著蛋白質結構數量快速成長,有效搜尋結構資料庫的方法愈形重要。當一個新的蛋白質結晶產生後,研究者會希望得知該蛋白質是否跟其他已知結構的蛋白質相似,以及其相似程度。由於蛋白質結晶結構的數量龐大,研究者便十分需要一個準確而有效率的搜尋相似結構之工具。在本研究論文中,我們發展一套新的工具「PiSA-BLAST」,除了提出準確的比對結果外,也能大幅提昇結構搜尋的執行速度。
這套工具依據DSSP程式所定義的蛋白質特殊資訊:kappa角與alpha角,利用分群演算法加以分析後得一轉換規則表。依據此規則表,將蛋白質結構資料庫裡所有已知結構的蛋白質轉換成一級序列,並建成序列資料庫。根據此序列資料庫,我們同時也發展一套新的計分陣列,將之用來計算序列比對時的比對分數。接著,我們結合知名的序列比對工具「BLAST」,在輸入一欲查詢、比對的蛋白質結構後,不需真正疊合兩個三級結構,即能快速地從含有大量序列的結構資料庫搜尋、比對,最後能獲得相似蛋白質的清單。
我們從SCOP及PDB資料庫中挑選出五套測試資料,以驗證PiSA-BLAST之效能。我們以108個查詢結構(query structures)在SCOP 95的搜尋結果為例,此資料庫包含9,354個蛋白質結構, PiSA-BLAST及CE在108個查詢的平均準確度分別為78.2%與82.1%,PiSA-BLAST總搜尋時間只需34秒,遠快於CE搜尋所需的1,169,832秒。另外,PSI-BLAST的平均準確度則為69.8%,並共花費18.3秒。根據本篇論文的研究結果,顯示下列結論:一、PiSA-BLAST能以接近BLAST的速度搜尋結構資料庫,並較CE快上34,000倍左右。二、PiSA-BLAST能獲得接近CE的準確度,同時較以胺基酸為基礎的序列比對工具,如BLAST、PSI-BLAST等,提供更精確的搜尋結果。這些結果顯示,在結構比對時,我們所發展的結構編碼以及計分陣列確實正確、可用。三、如同BLAST在執行序列比對時能輸出一e-value,PiSA-BLAST亦可在搜尋結構時提供此輸出值。經測試,當e-value小於閾值e-15時,PiSA-BLAST可達到90%的準確度。四、PiSA-BLAST可成為一個結構比對的快速篩選工具,先執行一次快速比對,輸出多個結果後再利用其他速度較慢,但比對方法詳盡、可信的工具如CE、DALI,作第二次的分析。五、PiSA-BLAST已建立網頁服務,使用者能在線上即時搜尋結構資料庫。綜合以上所述,本研究對於結構基因體學與蛋白質體學應有相當的貢獻。
The structural database searching has become increasingly important with growing numbers of known protein structures. This increase was near exponential in the early 1990s and has become linear over the past several years. As more and more the availability of the growing number of protein crystal structures, the demand for a very fast and accurate method to searching for structures similar to a query structure is high. In this thesis, we have developed a new tool, termed PiSA-BLAST for protein structure database search that does not require the alignment of two 3D structures.
Here we have developed a new method for the protein structure alignment by transforming 3D structures into 1D sequences. This method use the information of kappa and alpha angles, derived from DSSP program, to represent the protein 3D structure. Based on the segment information and clustering method, we transform the structural information with kappa and alpha angles into coded regions. After that, each protein with 3D structure is able to transfer into 1D sequence and we could develop a new substitution matrix that can be used as the scoring matrix of sequence alignment for 23 new codes. These encoded sequences are collected as a structure database. Launching BLAST, a well-known sequence alignment tool, to search structure database in a short time and we will get a list of proteins that are similar in structure.
We evaluated PiSA-BLAST on five diverse data sets from SCOP and protein data bank. For the dataset SCOP 95 with 108 queries on 9,354 protein domains, the average precisions of PiSA-BLAST and CE are 78.2% and 82.1%, respectively, and the total executing times are 34 seconds for PiSA-BLAST and about 1,169,832 seconds for CE. The average precision is 69.8% and time is 18.3 seconds for PSI-BLAST. Based on these experiments, we summarized several observations: (1) PiSA-BLAST is as fast as BLAST for protein structure database search and is 34,000 times faster than CE on the database SCOP 95. (2) The accuracy of PiSA-BLAST closes the accuracy of CE and much better than BLAST and PSI-BLAST which are based on amino-acid sequences. These results imply that our structural new codes and substitute matrix are useful for protein structure alignment. (3) PiSA-BLAST is able to provide a significant e-value with e-15 for structure database search as the e-value with e-3 in BLAST for sequence database search. PiSA-BLAST achieved about 90% accuracy for a query when e-value is less than e-15. (4) PiSA-BLAST is a useful filtering tool before performing a detailed database search, such as CE and DALI. (5) PiSA-BLAST is able to provide real-time web services for protein structure database search as BLAST in protein sequence search. We believe that this issue is important for structural genomics and proteomics.
Abstract (in Chinese) I
Abstract II
Acknowledgements (in Chinese) III
Contents IV
List of Tables VI
List of Figures VII



Chapter 1. Introduction 1
1.1 Motivations and Purposes 1
1.2 Related Works 2
1.3 Thesis Overview 4

Chapter 2. Materials and Methods 6
2.1 Preparing Training Set from Protein Structure Database 6
2.2 Dividing Protein Structures into Segments by Kappa-Alpha Angle Map 7
2.3 Finding Representative Segments and Using Nearest Neighbor Clustering Algorithm for New Codes Assigning 8
2.4 Generating a Substitution Matrix for 23 New Codes 11
2.5 Structure Searching by Sequence Alignment tool: BLAST and PSI-BLAST 13
2.6 Evaluating the Performance 14
2.7 Practical Applications 17

Chapter 3. Results and Discussions 18
3.1 Representative Segments of 23 New Codes 18
3.2 The Substitution Matrix for 23 New Codes 19
3.3 Evaluating Statistical Significance 20
3.4 Speed Evaluations 25
3.5 Performance Factor Analysis: Sequence Identity, Structure Similarity and Expect Value 25
3.6 Same Searching Cases Analysis 26
3.7 PiSA-BLAST on Practical Applications 28
3.8 Web Service 28

Chapter 4. Conclusions 29
4.1 Summary 29
4.2 Major Contributions and Future Perspectives 29

References 78
1. Matthews, B.W. and M.G. Rossmann, Comparison of protein structures. Methods in Enzymology, 1985. 115: p. 397-420.
2. Jain, A.K. and R.C. Dubes, Algorithms for Clustering Data. 1988, New Jersey: Prentice Hall: Englewood Cliffs.
3. Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 1997. 25: p. 3389-3402.
4. Altschul, S.F., et al., Basic local alignment search tool. Journal of Molecular Biology, 1990. 215: p. 403-410.
5. Chothia, C. and A.M. Lesk, The relation between the divergence of sequence and structure in proteins. The EMBO Journal, 1986. 5: p. 823-826.
6. Lesk, A.M. and C. Chothia, How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. Journal of Molecular Biology, 1980. 136: p. 225-270.
7. Sander, C. and R. Schneider, Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins: Structure, Function, and Bioinformatics, 1991. 9: p. 56-68.
8. Rost, B., Twilight zone of protein sequence alignments. Protein Engineering, 1999. 12: p. 85-94.
9. Holm, L. and C. Sander, Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology, 1993. 233: p. 123-138.
10. Madej, T., J.F. Gibrat, and S.H. Bryant, Threading a database of protein cores. Proteins: Structure, Function, and Bioinformatics, 1995. 23: p. 356-369.
11. Gibrat, J.F., T. Madej, and S.H. Bryant, Surprising similarities in structure comparison. Current Opinion in Structural Biology, 1996. 6: p. 377-385.
12. Shindyalov, I.N. and P.E. Bourne, Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering, 1998. 11: p. 739-747.
13. Martin, A.C., The ups and downs of protein topology; rapid comparison of protein structure. Protein Engineering, 2000. 13: p. 829-837.
14. Aung, Z. and K.L. Tan, Rapid 3D protein structure database searching using information retrieval techniques. Bioinformatics, 2004. 20: p. 1045-1052.
15. Hubbard, T.J.P., et al., SCOP: a structural classification of proteins database. Nucleic Acids Research, 1997. 25(1): p. 236-239.
16. Murzin, A.G., et al., SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 1995. 247: p. 536-540.
17. Kabsch, W. and C. Sander, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983. 22: p. 2577-2637.
18. Paul, J.B. and N.D. M., A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1992. 14: p. 239-256.
19. Henikoff, S. and J.G. Henikoff, Amino acid substitution matrices from protein blocks. Proceedings of The National Academy of Sciences of The United States of America, 1992. 89: p. 10915-10919.
20. Fano, R.M., In Transmission of information; A Statistical Theory of Communications. 1961, New York.: Wiley.
21. Pearson, W.R., Rapid and sensitive sequence comparison with FASTP and FASTA. Methods in Enzymology, 1990. 183: p. 63-98.
22. Pearson, W.R. and D.J. Lipman, Improved tools for biological sequence comparison. Proceedings of The National Academy of Sciences of The United States of America, 1988. 85: p. 2444-2448.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top