跳到主要內容

臺灣博碩士論文加值系統

(3.229.124.74) 您好!臺灣時間:2022/08/11 07:15
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:游景盛
研究生(外文):Chin-Sheng Yu
論文名稱:使用支持向量機以n個數擷取編碼蛋白質及表決法對蛋白質細部分類的研究
論文名稱(外文):Fine-grained Protein Fold Assignment by Support Vector Machines using generalized n-peptide Coding Schemes and jury voting from multiple parameter sets
指導教授:呂平'江黃鎮剛
指導教授(外文):P. C. LyuJ. K. Hwang
學位類別:碩士
校院名稱:國立清華大學
系所名稱:生命科學系
學門:生命科學學門
學類:生物學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:47
中文關鍵詞:支持向量機
外文關鍵詞:SVMSCOPn-peptide
相關次數:
  • 被引用被引用:0
  • 點閱點閱:175
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近幾年的蛋白質研究中,雖然解三級結構的技術不斷的改進,但速度上仍遠不及解一級結構快,若能單由蛋白質序列得知其結構,對於往後的研究幫助極大,與預測蛋白質二級結構不同的是,我們在此只需用到部分區域的序列資訊,藉著組合這些整個蛋白質的各個區域資訊以確認不同種類的蛋白質;在前人的研究中,機器學習的方法對於直接以整個蛋白質所轉譯所得的序列資訊辨認不同的蛋白質有相當的準確度,在此,我們使用了不同的轉譯法─n-peptide distribution,並以support vector machine method (SVM) 針對Structural Classification of Protein (SCOP) 資料庫中代表分屬27類的385個蛋白質進行學習。
在我們實驗結果中,對於一獨立資料組的預測率可達 69.6%,經過10次的交叉測試確認(10-fold cross validation)預測率可達55.5%,結果顯示若以適合的轉譯法轉譯整個蛋白質的序列資訊將可改進SVM預測蛋白質種類的結果,並希望能找出對protein modeling更有用的方法或規則。

Fold assignment directly from sequences is valuable in the prediction of protein structures. Unlike secondary structure prediction, where a local coding scheme of sequence information will usually suffice, fold identification calls for global protein descriptors as well local descriptors for the whole protein sequences. Previous studies have shown that machine learning methods can yield reasonable prediction accuracy of fold assignment directly from sequences by a variety of global sequence coding schemes. In this thesis, using global protein descriptors based on -peptide distribution, we apply the support vector machine method (SVM) to the 27 most populated folds that contain 386 representative proteins in the Structural Classification of Protein (SCOP) database. Our approach achieved a prediction accuracy 69.6% on an independent set, and 55.5% in the ten-fold cross validation, both of which are an order of magnitude higher than the current methods. Our results show that SVM using suitable global sequence coding schemes can significantly improve prediction in fold recognition from sequences, and should offer a useful tool in structure modeling.

Introduction 1
Methods 3
Data sets and input coding schemes 3
Training and testing procedures 5
Results 7
Discussion 9
References 11
Tables 13
Figures 17
Appendix 22

Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. & Nielsen, H. (2000). Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412-424.
Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C., Ares, M., Jr & Hasussler, D. (2000). Knowledge-based analysis of microarray gene expression data by using Support Vector Machine. Proc Natl Acad Sci U.S.A. 97.
Chang, C.-C. & Lin, C.-J. (2001). LIBSVM: a library for support vector machines. Software available from http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Ding, C. H. Q. & Dubchak, I. (2001). Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17, 349-358.
Dubchak, I., Holbrook, S. R. & Kim, S.-H. (1993). Prediction of protein folding class from amino acid composition. Proteins 16, 79-91.
Dubchak, I., Muchnik, I., Holbrook, S. R. & Kim, S.-H. (1995). Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci U.S.A. 92, 8700-8704.
Dubchak, I., Muchnik, I., Mayor, C., Dralyuk, I. & Kim, S.-H. (1999). Recognition of a protein fold in the context of the structural classification of proteins (SCOP). Proteins 35, 401-407.
Frishman, D. & Argos, P. (1995). Knowledge-based secondary structure assignment. Proteins: Struct. Funct. Genet. 23, 566-579.
Hua, S. & Sun, Z. (2001). A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J. Mol. Biol. 308, 397-407.
Jaakkola, T., Diekhans, M. & Haussler, D. (1999). Using the Fisher kernel method to detect remote protein homologies. ISMB, 149-158.
Levitt, M. & Chothia, C. (1976). Structural patterns in globular proteins. nature 261, 552-558.
Lo Conte, L., Ailey, B., Hubbard, T. J. P., Brenner, S. E., Murzin, A. G. & Chothia, C. (2000). SCOP: a structural classification of protein database. Nucleic Acids Res. 28, 257-259.
Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C. (1995). SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536-540.
Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B. & Thornton, J. M. (1997). CATH- A Hierarchic Classification of Protein Domain Structures. Structures 5, 1093-1108.
Rost, B. & Sander, C. (1993). Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584-599.
Vapnik, V. (1995). The Nature of Statistical Learning Theory, Springer, New York.
Wu, C. H., Zhao, S., Chen, H. L., J., L. C. & McLarty, J. (1996). Motif identification neural design for rapid and sensitive protein family search. Comput. Appl. Biosci. 12, 109-118.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top