跳到主要內容

臺灣博碩士論文加值系統

(35.172.136.29) 您好!臺灣時間:2021/08/02 03:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:白逸翰
研究生(外文):Yi-HanBai
論文名稱:利用蛋白質中經常出現的短序列來預測蛋白質交互作用
論文名稱(外文):Predicting protein-protein interactions based on frequent short sequences in proteins
指導教授:張天豪
指導教授(外文):Tien-Hao Chang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電機工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:中文
論文頁數:47
中文關鍵詞:蛋白質與蛋白質交互作用短序列
外文關鍵詞:protein-protein interactionshort sequence
相關次數:
  • 被引用被引用:0
  • 點閱點閱:97
  • 評分評分:
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
蛋白質與蛋白質交互作用(protein-protein interaction, PPI)在細胞的許多生物反應中扮演著重要的腳色。過去有研究發現用三個連續胺基酸所形成的短序列來分析蛋白質序列的組成,可以有效預測蛋白質與蛋白質之間的交互作用。然而,短序列長度被限制為三是因為現代的預測工具只能處理到三,而沒有針對更長的序列進行分析。
在本研究中,我們分析不同長度的短序列,找尋對蛋白質分析最有效的短序列來預測蛋白質與蛋白質交互作用。我們依據IR (information retrieval)領域中的tf-idf (term frequency-inverse document frequency)技術提出sf-ipf (sequence frequency-inverse protein frequency)技術。該技術利用短序列在蛋白質內出現的頻率來評估其對於蛋白質的重要性。實驗結果顯示使用本方法挑選的短序列,在分析人類(human)和釀酒酵母菌(yeast)這兩個物種的蛋白質交互作用時,能有效提升預測的效果的準確率。
Protein-protein interaction (PPI) plays an importance role in various biological processes of a cell. Previous works have revealed that interacting protein pairs can be predicted by analyzing the short sequences of three residues long in their primary structures. However, the length of the short sequences was limited to three merely because of the processing power of contemporary classification tools without a comprehensive study.
This study proposed a method to seek the most effective short sequences to predict PPI, where the short sequences can have different lengths. By referring the term frequency-inverse document frequency (tf-idf) in the field of information retrieval, this study proposed the sequence frequency-inverse protein frequency (sf-ipf), which estimates the importance of a short sequence to a protein by the observed frequency in all proteins. The experiments conducted in this study show that the short sequences selected by the proposed method largely improved the PPI prediction for human and yeast.
目錄 1
表目錄 3
圖目錄 4
Chapter 1. 緒論 5
Chapter 2. 相關研究 7
2.1 蛋白質與蛋白質交互作用的原理 7
2.2 利用序列資訊為特徵的預測方法 9
2.3 機器學習技術 13
2.3.1可變式核心密度估計 14
2.3.2支援向量機 15
2.3.3 比較兩個分類器差異 16
2.4 資料庫 17
2.4.1 BioGRID 17
2.4.2 KEGG 17
Chapter 3. 資料集與方法 18
3.1 資料收集 18
3.2 短序列挑選方法 19
3.3 特徵編碼(Feature Encoding) 20
Chapter 4. 實驗結果與討論分析 23
4.1 實驗流程 23
4.1.1 步驟一:資料切割 24
4.1.2步驟二:尺度化 25
4.1.3 步驟三:參數調整 26
4.2 效能評估準則 27
4.2.1 Precision 27
4.2.2 Sensitivity 28
4.2.3 F-measure 28
4.2.4 Specificity 28
4.2.5 Accuracy 29
4.2.6 Matthews correlation coefficient 29
4.3與其他方法的比較 30
4.3.1 利用可變式核心密度估計分類器之結果 30
4.3.2 利用支援向量機分類器之結果 31
4.4 評估各種短序列對預測效能的影響 33
4.4.1 找出最好的特徵向量落在幾維度 33
4.4.2 分析我們sf-ipf的排名到底是否正確 34
4.5 討論 36
4.5.1 分析不同長度的短序列分布 36
4.5.2 分析最重要前10維的短序列 37
4.5.3 分析驗證 39
Chapter 5. 結論與未來展望 41
5.1 結論 41
5.2 未來展望 41
Chapter 6. 補充資料 42
兩群差異極大的蛋白質群 42
參考文獻 44
1.Ge H, Walhout AJ, Vidal M: Integrating 'omic' information: a bridge between genomics and systems biology. Trends Genet 2003, 19(10):551-560.
2.Colizza V, Flammini A, Maritan A, Vespignani A: Characterization and modeling of protein–protein interaction networks. Physica A: Statistical Mechanics and its Applications 2005, 352(1):1-27.
3.Stanley Fields O-kS: A Novel Genetic System to Detect Protein Protein Interactions. Nature Publishing Group 1989.
4.Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences of the United States of America 2001, 98(8):4569-4574.
5.Gavin AC BM, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM et al:: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415(6868):141-147.
6.Ho Y GA, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K et al: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415(6868):180-183.
7.Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B et al: Proteome survey reveals modularity of the yeast cell machinery. Nature 2006, 440(7084):631-636.
8.Zhu H, Bilgin M, Bangham R, Hall D, Casamayor A, Bertone P, Lan N, Jansen R, Bidlingmaier S, Houfek T et al: Global analysis of protein activities using proteome chips. Science 2001, 293(5537):2101-2105.
9.Tong AHY DB, Nardelli G, Bader GD, Brannetti B, Castagnoli L, Evangelista M, Ferracuti S, Nelson B, Paoluzi S et al:: A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 2002, 295(5553):321-324.
10.Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW: BIND--The Biomolecular Interaction Network Database. Nucleic acids research 2001, 29(1):242-245.
11.Salwinski L MC, Smith AJ, Pettit FK, Bowie JU, Eisenberg D:: The Database of Interacting Proteins : 2004 update. Nucleic acids research 2004, 32:D449-D451.
12.Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stumpflen V: MPact: the MIPS protein interaction resource on yeast. Nucleic acids research 2006, 34(Database issue):D436-441.
13.Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R et al: IntAct--open source resource for molecular interaction data. Nucleic acids research 2007, 35(Database issue):D561-565.
14.Han JD, Dupuy D, Bertin N, Cusick ME, Vidal M: Effect of sampling on topology predictions of protein-protein interaction networks. Nature biotechnology 2005, 23(7):839-844.
15.Hart GT, Ramani AK, Marcotte EM: How complete are current yeast and human protein-interaction networks? Genome biology 2006, 7(11):120.
16.Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proceedings of the National Academy of Sciences of the United States of America 1999, 96(8):4285-4288.
17.Aloy P, Russell RB: Interrogating protein interaction networks through structural biology. Proceedings of the National Academy of Sciences of the United States of America 2002, 99(9):5896-5901.
18.Aloy P RR: InterPreTS: protein Interaction Prediction through Tertiary Structure. Bioinformatics 2003, 19(1):161-162.
19.Ogmen U, Keskin O, Aytuna AS, Nussinov R, Gursoy A: PRISM: protein interactions by structural matching. Nucleic acids research 2005, 33(Web Server):W331-W336.
20.Marcotte EM: Detecting Protein Function and Protein-Protein Interactions from Genome Sequences. Science 1999, 285(5428):751-753.
21.Soong TT, Wrzeszczynski KO, Rost B: Physical protein-protein interactions predicted from microarrays. Bioinformatics 2008, 24(22):2608-2614.
22.Huang TW, Tien AC, Huang WS, Lee YC, Peng CL, Tseng HH, Kao CY, Huang CY: POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome. Bioinformatics 2004, 20(17):3273-3276.
23.Espadaler J, Romero-Isart O, Jackson RM, Oliva B: Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships. Bioinformatics 2005, 21(16):3360-3368.
24.Shoemaker BA, Panchenko AR: Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS Computational Biology 2007, 3(4):595-601.
25.Bock JR, Gough DA: Predicting protein-protein interactions from primary structure. Bioinformatics 2001, 17(5):455-460.
26.Lo SL, Cai CZ, Chen YZ, Chung MC: Effect of training datasets on support vector machine prediction of protein-protein interactions. Proteomics 2005, 5(4):876-884.
27.Ben-Hur A, Noble WS: Kernel methods for predicting protein-protein interactions. Bioinformatics 2005, 21(Suppl 1):i38-i46.
28.Chen XW, Liu M: Prediction of protein-protein interactions using random decision forest framework. Bioinformatics 2005, 21(24):4394-4400.
29.Martin S, Roe D, Faulon JL: Predicting protein-protein interactions using signature products. Bioinformatics 2005, 21(2):218-226.
30.Chou KC, Cai YD: Predicting protein-protein interactions from sequences in a hybridization space. Journal of proteome research 2006, 5(2):316-322.
31.Nanni L, Lumini A: An ensemble of K-local hyperplanes for predicting protein-protein interactions. Bioinformatics 2006, 22(10):1207-1210.
32.Pitre S, Dehne F, Chan A, Cheetham J, Duong A, Emili A, Gebbia M, Greenblatt J, Jessulat M, Krogan N et al: PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs. BMC bioinformatics 2006, 7:365.
33.Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K, Li Y, Jiang H: Predicting protein-protein interactions based only on sequences information. Proceedings of the National Academy of Sciences of the United States of America 2007, 104(11):4337-4341.
34.Guo Y, Yu L, Wen Z, Li M: Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic acids research 2008, 36(9):3025-3030.
35.Yu CY, Chou LC, Chang DT: Predicting protein-protein interactions in unbalanced data using the primary structure of proteins. BMC bioinformatics 2010, 11:167.
36.Young KH: Yeast two-hybrid: so many interactions, (in) so little time. Biology of reproduction 1998, 58(2):302-311.
37.Y. Oyang ea: Data classification with a relaxed model of variable kernel density estimation. pp 2005:2831-2836.
38.Vapnik CCaV: Support vector machine. pp 1995, 20:273-297.
39.Shih C-W: Support Vector Machine. http://eeilimenckuedutw/knowledgebase/zhi-yuan-xiang-liang-ji-support-vector-machine 2011.
40.Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic acids research 2006, 34(Database issue):D535-539.
41.Hiroyuki Ogata SG, Kazushige Sato, Wataru Fujibuchi, Hidemasa Bono and Minoru Kanehisa: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic acids research 1999, 27(1):29-34.
42.Park Y, Marcotte EM: Revisiting the negative example sampling problem for predicting protein-protein interactions. Bioinformatics 2011, 27(21):3024-3028.
43.http://www.aminocodes.com/amino.html, Aminocodes.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top