跳到主要內容

臺灣博碩士論文加值系統

(44.200.82.149) 您好!臺灣時間:2023/06/05 11:34
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:黃思華
研究生(外文):Ssu-Hua Huang
論文名稱:利用數種蛋白質序列的胺基酸對組成之資訊預測外膜蛋白質
論文名稱(外文):Prediction of Outer Membrane Proteins by Support Vector Machines Using Combinations of Gapped Amino Acid Pair Compositions
指導教授:劉如生
指導教授(外文):Ru-Sheng Liu
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:英文
論文頁數:45
中文關鍵詞:外膜蛋白質預測支援向量法有間隔的胺基酸對組成β-barrel膜蛋白質
外文關鍵詞:outer membrane protein predictionSupport Vector Machinegapped amino acid pair compositionβ-barrel membrane proteins
相關次數:
  • 被引用被引用:0
  • 點閱點閱:172
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
革蘭氏陰性細菌的蛋白質在細胞中的位置(subcellular localization)有五種,其中位在外膜上的蛋白質稱為外膜蛋白質。這些蛋白質直接暴露在這種細菌細胞的最外層,所以是非常好的藥物標的物。使用實驗方法來鑑定外膜蛋白質是非常耗時的,所以我們急需一個準確的預測方法來預測外膜蛋白質。

已知結構的外膜蛋白質非常有限,因此我們直接從序列去預測。我們從蛋白質序列粹取數種有間隔的胺基酸對組成的資訊,當該蛋白質序列的特徵值,將這些特徵值表示成一個多維向量,用以訓練我們的分類器−支援向量機(support vector machine)。此分類器被訓練成可以區分一組蛋白質序列中哪些是外膜蛋白質哪些不是。

驗證我們的方法的資料集主要有兩個,一個資料集包含經過實驗證實蛋白質位置的蛋白質序列,其中471個是外膜蛋白質,其餘1,120個為非外膜蛋白質;另一個資料集包含377個外膜蛋白質序列及674個球蛋白質(globular proteins)序列,這些蛋白質屬於四種典型的結構類別。

使用第一個資料集,我們的方法達到95%的精確率(precision)和92%的召回率(recall);使用第二個資料集我們的方法達到96%的精確率和召回率。與使用常出現子序列的資訊加上支援向量法的PSORTb v.2.0的外膜蛋白質分類器作比較,顯示我們所取的特徵值比常出現子序列的出現與否好,特別是召回率有明顯提升;我們的方法也比使用雙肽組成的統計方法好,結果顯示使用支援向量法較純統計方法好。此外,我們的方法的all-β球蛋白質的召回率較雙肽組成的統計方法有明顯提升。我們額外建立一個含內膜蛋白質的資料集,預測效果也很好。
A Gram-negative bacterial protein can be resident at one of five primary subcellu-lar localizations. Proteins resident at outer membrane are called outer membrane pro-teins (OMPs). Because of their exposition at the surface of the bacterial cell, these pro-teins attract the research interest of drug target. Since identifying OMPs by experiments takes lengthy time, it is urgent to develop reliable methods to discriminating OMPs from other proteins.

In this thesis, we present a method for OMP prediction by Support Vector Ma-chines (SVMs) using the combinations of k-gapped amino acid pair compositions. Two dataset are used to evaluate our method. One dataset consists of 471 OMP sequences and 1,120 non-OMP sequences. These sequences are annotated with experimentally verified subcellular localizations. The other dataset consists of 377 OMPs and 1,120 globular proteins belonging to four typical structural classes. Using the former dataset, our classifier achieves 95% in precision and 92% in recall. The result indicates that the combination of k-gapped amino acid pair compositions captures more discriminatory information than the occurrences of frequent subsequences. Applied to the latter dataset, our classifier achieves as high as 96% in precision and recall. Compared to the statisti-cal method based on dipeptide composition, the result indicates that our SVM based method is better than the pure statistical method. Furthermore, our classifier performs well on an extended dataset containing additionally α-helical transmembrane proteins.
Coverage............................................................i
Title Page.........................................................ii
Chinese Approval..................................................iii
English Approval...................................................iv
Authorization.......................................................v
Chinese Abstract..................................................vii
English Abstract.................................................viii
Acknowledgments....................................................ix
Table of Contents...................................................x
List of Tables....................................................xii
List of Figures..................................................xiii
1 Introduction.....................................................1
1.1 Gram-negative Bacteria and Their Distinct Cell Structures...1
1.2 Outer Membrane Proteins (OMPs).................................2
1.3 Challenges of Outer Membrane Protein Prediction................3
1.4 Support Vector Machine Classification..........................4
1.5 Overview of Thesis.............................................6
2 Related Work.....................................................7
2.1 Prediction of Outer Membrane Proteins..........................7
2.1.1 Various Classifiers Used in Prediction..................7
2.1.2 Various Features Used in Prediction.....................7
2.2 Work Related to Our Method.....................................8
3 Datasets and Evaluation Methodology..............................9
3.1 Datasets....................................................9
3.1.1 Dataset 1............................................9
3.1.2 Dataset 2...........................................10
3.2 Performance Measures.......................................10
3.3 5-Fold Cross-validation and Optimal SVM Parameter Selection11
3.4 Validity Check Procedure...................................12
4 The Approach....................................................13
4.1 Calculation of k-Gapped Amino Acid Pair Composition........13
4.2 Combination of k-Gapped Amino Acid Pair Compositions.......14
4.3 Construction of SVM Classifiers............................14
5 Results and Discussions.........................................16
5.1 Performance of Different Accumulative Combinations............16
5.2 Discriminating OMPs from Proteins with Other Subcellular Localizations......................................................18
5.3 Discriminating OMPs from Globular Proteins of Four Structural Classes............................................................19
5.4 Comparison to Other Methods...................................20
5.5 Performance of Each Composition and Other Combinations........22
5.6 Effect of Different SVM Parameters............................23
5.7 α-Helical Transmembrane Proteins..............................24
6 Conclusion and Future Work......................................27
6.1 Conclusion....................................................27
6.2 Future Work...................................................27
Bibliography.......................................................29
[ 1]Martelli,P.L., Fariselli,P., Krogh,A. and Casadio,R., “A sequence-profile-based HMM for predicting and discriminating β barrel membrane proteins”, Bioinfor-matics, 18, 2002, pp. S46-S53.
[ 2]Bagos,P.G., Liakopoulos,T.D., Spyropoulos,I.C. and Hamodrakas,S.J., “A Hidden Markov Model method, capable of predicting and discriminating β-barrel outer membrane proteins”, BMC Bioinformatics, 5, 2004, pp. 29, the web service at http://bioinformatics.biol.uoa.gr/PRED-TMBB.
[ 3]Bigelow,H.R., Petrey,D.S., Liu,J., Przybylski,D. and Rost,B., “Predicting trans-membrane beta-barrels in proteomes”, Nucleic Acids Research, 32, 2004, pp. 2566-2577.
[ 4]Zhai,Y. and Saier,M., “The β-barrel finder (BBF) program, allowing identifica-tion of outer membrane β-barrel proteins encoded within prokaryotic genomes”, Protein Science, 11, 2002, pp. 2196-2207.
[ 5]Berven,F.S., Flikka,K., Jensen,H.B. and Eidhammer, I., “BOMP: A program to predict integral β-barrel outer membrane proteins encoded within genomes of Gram-negative bacteria”, Nucleic Acids Research, 32, 2004, pp. W394-W399.
[ 6]Liu,Q., Zhu,Y., Wang,B. and Li,Y., “Identification of β-barrel membrane proteins based on amino acid composition properties and predicted secondary structure”, Computational Biology and Chemistry, 27, 2003, pp. 355-361.
[ 7]Gromiha,M.M. and Suwa,M., “A simple statistical method for discriminating outer membrane proteins with better accuracy”, Bioinformatics, 21, 2005, pp. 961-968.
[ 8]Gromiha,M.M., Ahmad,S. and Suwa,M., “Application of residue distribution along the sequence for discriminating outer membrane proteins”, Computational Biology and Cheminstry, 29, 2005, pp. 135-142.
[ 9]She,R., Chen,F., Wang,K., Ester,M., Gardy,J.L. and Brinkman, F.S.L., “Fre-quent-subsequence-based prediction of outer membrane proteins”, SIGKDD’03, August 2003, pp. 24-27.
[ 10]Gardy,J.L., Laird,M.R., Chen,F., Rey,S., Walsh,C.J., Ester,M. and Brinkman, F.S.L., “PSORTb v.2.0: Expanded prediction of bacterial protein subcellular lo-calization and insights gained from comparative proteome analysis”, Bioinfor-matics, 21, 2005, pp. 617-623.
[ 11]Hua,S. and Sun,Z., “Support vector machine approach for protein subcellular lo-calization prediction”, Bioinformatics, 17, 2001, pp. 721-728.
[ 12]Park,K.-J. and Kanehisa,M., “Prediction of protein subcellular localizations by support vector machines using compositions of amino acids and amino acid pairs”, Bioinformatics, 19, 2003, pp. 1656-1663.
[ 13]Guo,J., Lin,Y. and Sun,Z., “A novel method for protein subcellular localization: Combining Residue-couple Model and SVM”, Proceedings of The 3rd Asia-Pacific Bioinformatics Conference, January 2005, pp. 117-129.
[ 14]Gardy,J.L., Spencer,C., Wang,K., Ester,M., Tusnady,G.E., Simon,I., Hua,S., de-Fays,K., Lambert,C., Nakai,K. and Brinkman,F.S.L., “PSORT-B: Improving pro-tein subcellular localization prediction for Gram-negative bacteria”, Nucleic Ac-ids Research, 31, 2003, pp. 3613-3617.
[ 15]Chang,C.-C. and Lin,C.-J., “LIBSVM : A library for support vector machines”, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[ 16]Ikeda,M, Arai,M, Okuno,T and Shimizu,T., “TMPDB: A database of experimen-tally-characterized transmembrane topologies”, Nucleic Acids Research, 31, 2003, pp. 406-409.
[ 17]Sonnhammer,E.L.L., von Heijne,G. and Krogh,A., “A hidden Markov model for predicting transmembrane helices in protein sequences”, Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, 6, 1998, pp. 175-182, the web server at http://www.cbs.dtu.dk/services/TMHMM/.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 173. Peng, Y. (1991), “The media holders’ evaluations on the media credibility during elections,” Journalism Research, ROC, Vol. 44, pp. 95-117. (彭芸, 民80, “媒體負責人對選舉期間媒體可信度的評估,” 新聞學研究, 44, 頁95-117)
2. 林文寶(1990)。通古才足以變今—— 傳統啟蒙教育鳥瞰。國文天地,6(4),13-15。
3. 林于弘、許慧玉(2002)。國小一年級國語識字量之比較觀察研究。國民教育,43(1),65-69。
4. 吳慧敏(1999)。語境對兒童閱讀字彙習得的影響。佛光學刊,2,299-314。
5. 何清谷(1993)。中國最早的識字教材— —史籀篇。歷史月刊,69,26-29。
6. 邱燮友(2004)。建立詩教的新秩序。中國語文,559,4-6。
7. 江應龍(1990)。且說私塾。國文天地,6(4),39-44。
8. 李政勳(1997)。談兒童讀經。中國語文月刊,475(1),86-89。
9. 李貴生(2004)。傳統家訓與現代兒童教育。國文天地,20(2),23-32。
10. 白雲開(2004)。中國傳統蒙學的教學初探——以《童蒙急務》為例。國文天地,20(2),33-44。
11. 218. Wu, C. (2000), “The causal relation between the characteristics if web, web users attitude and intention to purchase,” Journal of Business Administration (ROC), Vol. 46, pp. 33-48. (吳肇銘, 民89, “網站特性與網站使用者之網站態度、購物意願之關係研究—結構化方程式模型,” 企業管理學報,第四十六期)
12. 林文寶(2004)。啟蒙教材與讀經。國文天地,20(2),4-13。
13. 林翠鳳(1995)。中國傳統蒙書敘說。國文天地,11(1),50-55。
14. 林惠勝(1999)。漫談讀經。語文教育通訊,18,7-11。
15. 林新發、王秀玲(2003)。海峽兩岸義務教育政策演變和形成背景之分析。國民教育,43(4),11-19。