(3.236.231.14) 您好!臺灣時間:2021/04/14 01:46
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳柏諺
論文名稱:黑腹果蠅蛋白質交互作用預測系統基於雙肽組成
論文名稱(外文):A Predicting System for Drosophila Melanogaster Protein-Protein Interaction Based on Dipeptide Composition
指導教授:阮議聰
指導教授(外文):Eric Y.T.Juan
學位類別:碩士
校院名稱:國立臺灣海洋大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:中文
論文頁數:58
中文關鍵詞:蛋白質交互作用特徵抽取雙肽組成最近鄰居法
相關次數:
  • 被引用被引用:0
  • 點閱點閱:484
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
蛋白質交互作用在許多生物過程中扮演一個重要的角色。為了瞭解細胞內的分子機制,我們必須獲得允許蛋白質在細胞中執行多種工作的蛋白質交互作用的訊息。預測蛋白質交互作用在近幾年來已經變成熱門的議題,原因在於透過可靠的實驗來分析蛋白質交互作用是昂貴且耗時的。雖然大量的蛋白質交互作用資料已經被許多高速度的實驗技術產生出來,但是這些資料導致預測結果同時具有偽陽性以及偽陰性的機率很高。
近來的研究顯示,發展與應用蛋白質交互作用預測的計算機技術有重要的價值。然而,許多的預測系統需要有關蛋白質同源性的資訊以及/或蛋白質的結構資訊。因為可以取用蛋白質序列到的蛋白質資料來源較蛋白質結構資料來源來得豐富,所以一些只基於蛋白質序列資訊的蛋白質交互作用預測系統已經被開發出來。
在本研究中,我們提出一套黑腹果蠅的蛋白質間交互作用預測系統 (PPI-DC)。在我們的PPI-DC系統,雙肽組成(Dipeptide Composition)首先被用來將不同長度的蛋白質序列轉換成固定長度的數值陣列。第二步,基準線差值表示法(DLD)被提出來將蛋白質間交互作用對(pair)的數值陣列轉換成特徵向量。最後我們使用k-nearest neighbor 演算法來對蛋白質間交互作用進行分類預測。我們的系統在之前相關研究中使用的資料集上有更好的表現。實驗結果顯示運用了本論文所提出的方法的情況下,預測準確度上會有重要的改進。此外,我們的系統已經在不同的獨立資料集上被成功地的驗證蛋白質間交互作用預測的可靠度。

摘要 i
Abstract iii
第一章 1
導論 1
1.1 研究背景與動機 1
1.2 論文架構 6
第二章 8
相關研究 8
2.1 生物學的相關研究 8
2.2 其他種類的預測系統 9
2.3 和蛋白質交互作用有關的研究成果 10
2.3.1基於序列資訊的預測系統 10
2.3.2基於結構資訊的預測系統 11
2.3.3 基於領域資訊的預測系統 12
第三章 14
測試資料與系統架構 14
3.1 測試資料集合 14
3.1.1 蛋白質交互作用資料集合 15
3.2 系統Drosophila Melanogaster PPI-DC 簡介 16
3.3 常見的蛋白質描述方法 18
3.3.1蛋白質的胺基酸組成 (Amino Acid Compostion, AAC) 18
3.3.2偽胺基酸組成 (Pseudo-Amino Acid Composition, PseAA) 19
3.3.3 雙肽組成 (Dipeptide Composition, Dipeptide) 21
3.3.4結合胺基酸索引與分區段的AAwindow 22
3.3.4基於演化訊息的偽序列相似度方法 (Pseudo-Position Specific Scoring Matrix, PsePSSM) 24
3.3.5二變異的胺基酸組成 (Binary Variation Amino Acid Composition, BV-AA) 27
3.3.6基於演化訊息的二變異方法 (Binary Variation Method Based on Evolution Information) 29
3.4 蛋白質交互作用資料格式 32
3.5 蛋白質間交互作用對的特徵向量編碼 34
3.6 去除相似的蛋白質對 36
3.7 特徵抽取方法 38
3.8 最近鄰居演算法 (K-Nearest Neighbor, KNN) 40
第四章 41
實驗與分析 41
4.1 評估預測表現的方法跟標準 41
4.1.1 評估方法 41
4.1.2 預測表現的評斷標準 42
4.2 預測表現與系統比較 44
4.3 結論 50
參考論文 52


[1] Juwen Shen, Jian Zhang, Xiaomin Luo, Weiliang Zhu, Kunqian Yu, Kaixian Chen, Yixue Li, and Hualiang Jiang, “Predicting protein–protein interactions based only on sequences information.” Proc Natl Acad Sci U S A., vol. 104, pp. 4337, 2007
[2] Shawn M. Gomez1, William Stafford Noble and Andrey Rzhetsky, “Learning to predict protein–protein interactions from protein sequences.” BMC Bioinformatics, vol. 19, pp. 1875, 2003.
[3] Yan-Ping Zhang, Li-Na Zhang, Yong-Cheng Wang, “Prediction of Protein–Protein Interaction Sites Using Constructive Neural Network Ensemble.” International Conference on Computational Intelligence and Software Engineering (CiSE), pp. 1-4, 2010
[4] Hairong Lei, Joe Michael Kniss, “Protein-Protein Interaction Prediction Using Single Class SVM.” 2008 Seventh International Conference on Machine Learning and Applications, pp. 883-887, 2008
[5] Naifang Su, Lin Wang, Yufu Wang, Minping Qian and Minghua Deng, “Prediction of Protein Functions from Protein-Protein Interaction Data Based on a New Measure of Network Betweenness.” 2010 4th International Conference on Bioinformatics and Biomedical Engineering (iCBBE), pp. 1-4, 2010
[6] Patrick Aloy, Robert B. Russell, “InterPreTS: protein Interaction Prediction through Tertiary Structure.” BMC Bioinformatics, vol. 19, pp. 161-162, 2003
[7] Chengbang Huang, Faruck Morcos, Simon P. Kanaan, Stefan Wuchty, Danny Z. Chen, and Jesu´ s A. Izaguirre, “Predicting Protein-Protein Interactions from Protein Domains Using a Set Cover Approach.” IEEE Transactions on Computational Biology and Bioinformatics, vol. 4, pp. 78-87, 2007
[8] Mina Maleki, Michael Hall, Luis Rueda, “Using structural domains to predict obligate and non-obligate protein-protein interactions” 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), vol. pp. 9-15, 2012
[9] Joel R. Bock and David A. Gough, “Predicting protein–protein interactions from primary structure.” BMC Bioinformatics, vol. 17, pp. 455-460, 2001
[10] Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D, “The Database of Interacting Proteins: 2004 update.” pp. 449-451, 2004
[11] Chris Stark, Bobby-Joe Breitkreutz, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, and Mike Tyers, “BioGRID: a general repository for interaction datasets.” pp. 535-559, 2005
[12] Lars J. Jensen, Michael Kuhn, Manuel Stark, Samuel Chaffron, Chris Creevey, Jean Muller, Tobias Doerks, Philippe Julien, Alexander Roth, Milan Simonovic, Peer Bork and Christian von Mering, “STRING 8 - a global view on proteins and their functional interactions in 630 organisms”, vol. 37, pp. 412-416, 2009
[13] Atanas Kamburov, Konstantin Pentchev, Hanna Galicka, Christoph Wierling, Hans Lehrach and Ralf Herwig, “ConsensusPathDB: toward a more complete picture of cell biology.” vol. 39, pp. 712-717, 2001
[14] H. B. Shen and K. C. Chou. “PseAAC: A flexible web-server for generating various kinds of protein pseudo amino acid composition.” Analytical Biochemistry, vol. 373, pp. 386-388, 2008.
[15] T. Habib, C. Zhang, J. Y. Yang, M. Q. Yang and Y. Deng. “Supervised learning method for the predction of subcellular localization of proteins using amino acid and amino acid pair composition.” BMC Genomics, vol. 9. pp. S16, 2008.
[16] K.C. Chou, “Prediction of protein cellular attributes using pseudo amino- acid-compostion.” PROTEINS: Structure, Function, and Genetics, vol. 43, pp. 246-255, 2001.
[17] H. B. Shen and K. C. Chou, “Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition.” Biochem. Biophys. Res. Comm., vol. 337, pp. 752-756, 2005.
[18] W.-L. Huang, C.-W. Tung, H.-L. Huang, S.-F. Hwang and S.-Y. Ho. “Proloc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features.” BioSystems, vol. 90, pp. 571-581, 2007.
[19] H. B. Shen and K. C. Chou. “Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-postive bacterial proteins.” Protein Engineering, Design &; Selection, vol. 20, pp. 39-46, 2007.
[20] K. C. Chou and H. B. Shen. “MemType-2l: A Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM.” Biochem Biophys Res. Comm., vol. 360, pp. 339-345, 2007.
[21] A. Garg and G. P. S. Raghava. “ESLpred2 imporved method for predicting subcellular localization of eukaryotic proteins.” BMC Bioinformatics, vol. 9, pp. 503, 2008.
[22] Liang Liu, Yudong Cai, Wencong Lu, Kaiyan Feng, Chunrong Peng, Bing Niu, “Prediction of protein–protein interactions based on PseAA composition and hybrid feature selection” Biochemical and Biophysical Research Communications, vol. 380, pp. 318-322, 2009.
[23] Xinyi Liu., Bin Liu., Zhimin Huang, Ting Shi, Yingyi Chen, Jian Zhang, “SPPS: a sequence-based method for predicting probability of protein-protein interaction partners.” PLoS ONE, vol. 7, pp. 1-6, 2012
[24] “Large-Scale Prediction of Human Protein - Protein Interactions from Amino Acid Sequence Based on Latent Topic Features” Journal of Proteome Research, vol. 9, pp. 4992-5001, 2010
[25] Christopher G. Burd, and Gideon Dreyfuss. ” Conserved Structures and Diversity of Functions of RNA-Binding Proteins” Science. vol. 265, pp. 615-21, 1994
[26] Nazar Zaki, Safaai Deris and Saleh Alwahaishi. “SubSS: A Protein-Protein Interaction Detection Tool” 6th IEEE/ACIS International Conference on Computer and Information Science. pp. 999–1004, 2007.
[27] Saul B. Needleman, Christlan D. Wunsch. “A General Method Applicable to the Search for Simiarities in the Amino Acid Sequence of Two Proteins” Journal of Molecular Biology. vol. 48, pp. 443-453, 1970
[28] CeÂdric Notredame, Desmond G. Higgins and Jaap Heringa. “T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment” Journal of Molecular Biology. vol. 322, pp. 205-217 2000
[29] W. R. Pearson, “Rapid and sensitive sequence comparisons with FASTAP and FASTA Method”, Enzymol, vol. 183, pp. 63, 1985.
[30] Liu, L.; Cai, Y.; Lu, W.; Feng, K.; Peng, C.; Niu, B. “Prediction of protein–protein interactions based on PseAA composition and hybrid feature selection” Biochem. Biophys. Res. Commun. vol. 380, pp. 318-322, 2009
[31] Martin S.R. Paradesi, Doina Caragea, William H. Hsu. “Structural Prediction of Protein-Protein Interactions in Saccharomyces cerevisiae” Conference on Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International pp. 1270-1274, 2007
[32] K.C. Chou, Y.D. Cai “Predicting Protein−Protein Interactions from Sequences in a Hybridization Space” Journal of Proteome Research. vol. 5, pp. 316-322, 2006
[33] H. Peng, F. Long and C. Ding. “Feature Selection Based on Mutual Information: Criteria of Max-Relevance, and Min-Redundancy.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 1226, 2005
[34] K. Kira and L.A. Randell. “The feature selection problem: Traditional method and a new algorithm.” In: Proceedings of Minth National Conference on Artificial Intelligence, 129-137, 1992
[35] I. Kononenko. “Estinating attributes: Analysis and extension of RELIEF.” In: Proceedings of European Conference on Machine Learning, 171-182, 1994
[36] C.-W. Tung and S.-Y. Ho. “POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties.” Bioinformatics, vol. 23, pp. 942-949, 2007
[37] M. Kumar1, M. M. Gromiha and G. P. S. Raghava. “Identification of DNA-binding proteins using support vector machines and evolutionary profiles.” BMC Bioinformatics, vol. 8, pp. 463, 2007
[38] M. Rashid, S. Saha and G. P.S. Raghava. “Support Vector Machine-based method for predicting subcellur localization of mycobacterial proteins using evolutionary information and motifs.” BMC Bioinformatics, vol. 8, pp. 337, 2007
[39] A. Garg and G. P. S. Raghava. “ESLpred2 improved method for predicting subcellular localization of eukaryotic proteins.” BMC Bioinformatics, vol. 9, pp. 503, 2008
[40] K. C. Chou and H. B. Shen. “MemType-2L: A Web server for predicting membrance proteins and their types by incorporating evolution information through Pse-PSSM.” Biochem Biophys Res. Comm. vol. 360, pp. 339-345, 2007
[41] E. Tantoso and K.-B. Li. “AAIndexLoc: predicting subcellular localization of proteins based on a new representation of sequences using amino acid indices.” Amino Acids, vol. 35, pp. 345-353, 2008
[42] Emily C. Y. Su, H. S. Chiu, A. Lo, J. K. Hwang, T. Y. Sung and W. L. Hsu, “Protein subcellular localization prediction based on compartment-specific features and structure conservation.” BMC Bioinformatics, vol. 8, pp. 330, 2007
[43] Cline TW “The Drosophila sex determination signal: how do flies count to two?” Trends in Genetics vol. 11, pp. 385-390, 1993
[44] Blumenthal AB, Kriegstein HJ, Hogness DS. “The Units of DNA Replication in Drosophila melanogaster Chromosomes” Cold Spring Harb Symp Quant Biol. vol. 38, pp. 205–223, 1974
[45] Mel B. Feany, Welcome W. Bender “A Drosophila model of Parkinson's disease” Nature, vol. 404, pp. 394-398, 2000
[46] Anthony M. Brumby, Helena E. Richardson. “Using Drosophila melanogaster to map human cancer pathways” Nature Reviews Cancer vol. 5, pp. 626-639, 2005
[47] Arniker, S.B. Kwan, H.K. Ngai-Fong Law; Lun, D.P.-K. "Promoter prediction using DNA numerical representation and neural network: Case study with three organisms," India Conference (INDICON), 2011 Annual IEEE , vol., pp. 1-4, 2011
[48] Manoj Bhasin, Aarti Garg and G. P. S. Raghava, “PSLpred: Prediction of Subcellular Localization of Bacterial Proteins.” Bioinformatics, 2005.
[49] Li Wu, Qi Dai, Bin Han, Lei Zhu, Lihua Li, "Combining Sequence Information and Predicted Secondary Structural Feature to Predict Protein Structural Classes." 2011 5th International Conference on Bioinformatics and Biomedical Engineering, (iCBBE), vol., pp. 1-4, 2011
[50] K. Chou, “A key driving force in determination of protein structural classes,” Biochemical and biophysical research communications Elsevier, vol. 264, pp. 216–224, 1999
[51] Dengdi Sun, Jin Fan, Haifeng Zhao, Bin Luo "Inferring Protein Annotation from Topological Structural Analysis on Protein Interaction Network," 2010 4th International Conference on Bioinformatics and Biomedical Engineering (iCBBE). vol., pp.1-4, 2010
[52] Jingkai Yu, Fotouhi, F., Finley, R.L. "Combining Bayesian Networks and Decision Trees to Predict Drosophila melanogaster Protein-Protein Interactions" 2005. 21st International Conference on Data Engineering Workshops , vol., pp. 1159, 2005
[53] C.S. Yu, C.J. Lin, J.K. Hwang, "Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions" Protein Sci., 13, 1402-1406, 2004.
[54] C.S.Yu, Y.C. Chen, C.H. Lu, J.K. Hwang, "Prediction of protein subcellular localization" Proteins, 64: 643-51, 2006.
[55] A. Hoglund, P. Donnes, T. Blum, H.W. Adolph, O. Kohlbacher, "MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs, and amino acid composition," Bioinformatics, 22: 1158-65, 2006.
[56] K.C. Chou, "Structural bioinformatics and its impact to biomedical science," Curr. Med. Chem, 11: 2105-2134, 2004.
[57] C. Chen, X. Zhou, Y. Tian, X. Zou, P. Cai, "Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network," Anal. Biochem, 357: 116-121, 2006.
[58] Habib,T., Zhang,C., Yang,J.Y., Yang,M.Q. and Deng,Y. (2008) Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition. BMC Genomics, 9, S16
[59] S. Kawashima, P. Pokarowski, M. Pokarowska, A. Kolinski, T. Katayama, M. Kanehisa, "AAindex: amino acid index database, progress report 2008," Nucleic Acids Res., 36: D202-D205, 2008
[60] S.F. Altschul, T.L. Madden, A.A. Schaffer, J.H. Zhang, Z. Zhang, W. Miller, D.J. Lipman, "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Nucleic Acids Res., 25: 3389-3402, 1997
[61] M.M. Gromiha, Y. Yabuki, "Functional discrimination of membrane proteins using machine learning techniques," BMC Bioinformatics, 9: 135, doi: 10.1186/1471-2105-9-135, 2008
[62] M. Dash and H. Liu. “Feature Selection for Classification.” Intelligent Data Analysis, vol. 1, pp. 131-156, 1997
[63] Nigsch F., Bender A., Buuren B.V., Tissen J.,Nigsch E. and Mitchell J.B.O. “Melting Point Prediction Employing k-nearest Neighbor Algorithms and Genetic Parameter Optimization. Journal of Chemical Information and Modeling” vol. 46, pp. 2412-2422, 2006
[64] Cover T.M. and Hart P.E. “Nearest neighbor pattern classification. IEEE Transactions on Information Theory” vol. 13, pp. 21-27, 1967
[65] Aha D. and Kibler D. “Instance-based learning algorithms.” Machine Learning, vol. 6, pp. 37-66, 1991
[66] K. Ron, "A study of cross-validation and bootstrap for accuracy
estimation and model selection" Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, vol. 2, pp. 1137-1143, 1995.
[67] J. Chang, Y. Luo, K. Su, "GPSM: a Generalized Probabilistic Semantic Model for ambiguity resolution," Annual Meeting of the ACL. Association for Computational Linguistics. Morristown. NJ, pp. 177-184, 1992.
[68] M. Frederick, "A k-sample slippage test for an extreme population," Annals of Mathematical Statistics, vol. 19, pp. 58-65, 1948.
[69] B. W. Matthews. “Comparison of the predicted and observed secondary structure of T4 phage lysozyme.” Biochim. Biophys. Acta. vol. 405, pp. 442-451, 1975
[70] Pasquale Petrilli . “Computer applications in the biosciences : CABIOS” vol. 9, pp. 208-209, 1993
[71] Li, Feng-Min; Li, Qian-Zhong “Predicting Protein Subcellular Location Using Chou's Pseudo Amino Acid Composition and Improved Hybrid Approach” vol. 15, pp. 612-616, 2008

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔