(3.238.235.155) 您好!臺灣時間:2021/05/16 17:10
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:陳志堅
研究生(外文):Chih-Chien Chen
論文名稱:WildPiRa:運用支持向量機分類器與同時保留模式提昇蛋白質核醣核酸結合殘基之預測
論文名稱(外文):WildPiRa: improve the prediction of RNA-binding residues of protein sequence using support-vector machine and co-conserved motifs
指導教授:劉寶鈞劉寶鈞引用關係
指導教授(外文):Baw-Jhiune Liu
學位類別:碩士
校院名稱:元智大學
系所名稱:生物與醫學資訊碩士學位學程
學門:生命科學學門
學類:生物訊息學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:中文
論文頁數:50
中文關鍵詞:保留模式探勘RNA結合區RNA結合殘基RNA結合蛋白質WildSpan機器學習分類器PiRaNhA
外文關鍵詞:RNA-binding residuesconserved pattern miningRNA-binding proteinWildSpanMachine learning classifierSupport Vector MachinePiRaNhA
相關次數:
  • 被引用被引用:0
  • 點閱點閱:104
  • 評分評分:
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
核醣核酸結合之蛋白質(RNA-binding proteins;RBPs)無論在後基因體時代的基因表現控制(gene expression)或者其他的生物過程中都扮演著重要的角色。因此,從蛋白質序列(protein sequence)中預測或識別出核醣核酸之結合殘基(RNA-binding residues; RBRs)為深入了解生物識別中重要的一步。由於目前核醣核酸與蛋白質交互作用的結構資訊依然非常稀少,因此有強烈的需求需要從蛋白質序列資訊直接預測RBRs。本論文主要提出一個新的複合型預測方法,稱為WildPiRa,主要結合限制行序列模式探勘演算法(WildSpan)所探勘之同時保留模式(co-conserved motifs)與以目前預測RBRs能力較佳之支持向量機(Support Vector Machine)分類器(PiRaNhA),最後將兩者預測結果透過各種不同之組合方法從蛋白質序列中預測RBRs。首先,我們使用117筆具有RNA-protein complexes的資料來個別比較WildSpan與PiRaNhA分類器, F-measure平均值分別為0.402及0.298,顯示 WildSpan比PiRaNhA有較優異的預測表現。當同時整合兩者之預測結果,其F-measure值從原有的0.402與0.298提升至0.509。總結來說,單使用WildSpan即能有效從蛋白質序列中預測RBRs,僅透過同源序列而無須依賴蛋白質交互作用結構,與機器學習分類器之比較並不遜色,尤其當WildSpan之同時保留模式整合預測分類器之預測結果,在預測效能上更能有效的提升。

The identification of RNA-binding residues (RBRs) in proteins is important in molecular recognition. In the absence of structures for RNA-protein complexes, it is strongly desirable to predict RBRs by protein sequences alone. In this thesis, we proposed a novel hybrid prediction method WildPiRa to tackle this problem, which combines co-conserved motifs discovered by WildSpan with the results predicted by a best SVM-based classifier PiRaNhA as we have known so far for identifying RBRs in protein sequences. The WildSpan and PiRaNhA are invoked to discover concurrently conserved patterns composed of multiple motifs spanning large wildcard regions in homologous sequences and to predict RBRs through trained classifier from protein sequence, respectively. Finally, both results are cooperatively used to identify RBRs in protein sequences by using several different combined methods that we proposed. We compare WildSpan, PiRaNhA, and WildPiRa on a dataset of 117 RNA-binding proteins in average; the predicting power of WildSpan using all of discovered co-conserved motifs achieves an F-measure of 0.402, which is better than an F-measure of 0.298 predicted by the structure-based trained classifier PiRaNhA. The performance of WildPiRa further improved the F-measure to 0.509 when both results are cooperatively integrated to identify RBRs in protein sequences. Conclusively, the efficiency of sequence-based WildPiRa is not only favorable in predicting complex-structure-unknow protein but also largely desired in large-scale proteomics.

書名頁……………………………………………………………………………………i
論文口試委員審定書 …………………………………………………………………ii
中文摘要……………………………………………………………………..…………iii
英文摘要………………………………………………………...……………………iv
誌謝 ……………………...………………………………………….…………………v
目錄…………………………………………………………………………….………vi
表目錄…………………………………………………………………..……………viii
圖目錄 ……………………………………………………………………….………ix
第一章、 緒 論 1
第一節 基礎概念 1
壹、 生物中心法則 1
貳、 蛋白質 4
參、 生物資訊資料庫 4
肆、 RNA與蛋白質之交互作用 6
第二節 研究動機 6
第三節 研究範圍 7
第四節 論文組織架構 8
第二章、 文 獻 探 討 9
第一節 RNA與蛋白質之交互作用預測 9
第二節 屬性序列特徵向量 10
第三節 WildSpan 11
第三章、 研 究 方 法 13
第一節 實驗資料集 17
第二節 屬性資料集 18
第三節 以滑動視窗建立屬性集合 20
第四節 保留模式與整合分類器之選取方法 21
第五節 資料交叉驗證法 22
第六節 預測效能評估法 23
第四章、 實 驗 結 果 27
第一節 PiRaNhA預測分析 27
第二節 WildSpan與PiRaNhA預測之比較 29
第三節 WildSpan整合分類器組成分析之預測 30
第五章、 結 論 32
參 考 文 獻 33
附 錄 :實驗操作手冊 38


1. Crick, F., Central dogma of molecular biology. Nature,1970. 227(5258): p.561-563
2. The BLAST Databases [ftp://ftp.ncbi.nih.gov/blast/db].
3. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E. (2000) The Protein Data Bank. Nucleic Acids Research , 28: 235-242.
4. Jurica, M.S. and Moore, M.J. (2003) Pre-mRNA splicing: awash in a sea of proteins. Molecular Cell Biology , 12(1):5-14.
5. Noller, H.F. (2005) RNA Structure: Reading the Ribosome. Science , 309: 1508-1514.
6. Moore, M.J. (2005) From birth to death: the complex lives of eukaryotic mRNAs. Science , 309: 1514-1518.
7. Freed, E.O., Mouland, A.J. (2006) The cell biology of HIV-1 and other retroviruses. Retrovirology , 3: 77. doi:10.1186/1742-4690-3-77..
8. Jurica, M.S., Moore, M.J. (2003) Pre-mRNA splicing: awash in a sea of proteins. Molecular Cell , 12(1):5-14.
9. Moore, M.J. (2005) From birth to death: the complex lives of eukaryotic mRNAs. Science , 309:1514-1518.
10. Noller, H.F. (2005) RNA Structure: Reading the Ribosome. Science , 309: 1508-1514.
11. Aas, P.A., Otterlei, M., Falnes, P.O., Vagbo, C.B., Skorpen, R., Akbari, M., Sundheim, O., Bjoras, M.L., Slupphaug, G., Seeberg, E., Krokan, H.E. (2003) Human and bacterial oxidative demethylases repair alkylation damage in both RNA and DNA. Nature , 421:859-863.
12. Bock, R. (2000) Sense from nonsense: how the genetic information of chloroplasts is altered by RNA editing. Biochimie , 82: 549-557.
13. Hall,K.B. (2002) RNA–protein interactions. Current Opinion in Structural Biology , 12:283-288.
14. Tian, B., Bevilacqua, P.C., Diegelman-Parente, A. and Mathews, M.B. (2004) The double-stranded-RNA-binding motif: Interference and much more. Nature Reviews Molecular Cell Biology , 5:1013-1023.
15. Storz, G. (2002) An expanding universe of noncoding RNAs. Science , 296:1260-1263.
16. Mattick, J.S., (2005) The functional genomics of noncoding RNA. Science , 309:1527-1528.
17. Ravasi, T., Suzuki, H., Pang, K.C., Katayama, S., Furuno, M., Okunishi, R., Fukuda, S., Ru, K., Frith, M.C., Gongora, M.M., Grimmond, S.M., Hume, D.A., Hayashizaki, Y. and Mattick, J.S. (2006) Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome. Genome Research , 16:11-9.
18. Ambros, V. (2001) MicroRNAs: tiny regulators with great potential. Cell , 107(7):823-826.
19. Wang, L., Brown, S.J. (2006) BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Research , 34:W243-248.
20. Jeong, E., Chung, I.F., Miyano, S. (2004) A neural network method for identification of RNA-interacting residues in protein. Genome Inform , 15:105-116.
21. Terribilini, M., Lee, J.H., Yan, C., Jernigan, R.L., Honavar, V., Dobbs, D. (2006) Prediction of RNA binding sites in proteins from amino acid sequence. RNA , 12:1450-1462.
22. Kyte,J. and Doolittle,R.F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol., 157, 105–132.
23. Jeong, E., Miyano, S. (2006) A Weighted profile based method for protein-RNA interacting residue prediction. Transactions on Computational Systems Biology , pp 123-139.
24. Wagner,M. et al. (2005). Linear regression models for solvent accessibility prediction in proteins. J. Comput. Biol., 12, 355–369.
25. Ellis,J.J. et al. (2007) Protein-RNA interactions: structural analysis and functional classes. Proteins: Struct. Funct. Bioinf., 66, 903–911.
26. Spriggs, R.V., et al., (2009) Protein function annotation from sequence: prediction of residues interacting with RNA. bioinformatics, 25(12):p. 1492-7.
27. Wang,L.J. and Brown,S.J. (2006) BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res., 34, W243–W248.
28. Terribilini,M. et al. (2007) RNABindR: a server for analyzing and predicting RNAbinding sites in proteins. Nucleic Acids Res., 35, W578–W584.
29. Kumar,M. et al. (2008) Prediction of RNA-binding sites in a protein using SVM and PSSM profile. Proteins, 71, 189–194.
30. Jeong, E., Chung, I., Miyano, S. (2004) A neural network method for identification of RNA-interacting residues in protein. Genome Informatics. 15(1):105-116.
31. Jeong, E., Miyano, S. (2006) A weighted profile based method for protein-RNA interacting residue prediction. Transactions on Computational Systems Biology. Lecture Notes in Computer Science. 3939: 123-139.
32. Chakrabarti S et al., SCANMOT: searching for similar sequences using s simultaneous scan of multiple sequence motifs. Nucleic Acids Res 2005, 33:W274-W276.
33. Chen-Ming Hsu, Chien-Yu Chen*, band Baw-Jhiune (Mar 2011) WildSpan: mining structured motifs from protein sequences. Algorithms for Molecular Biology,6(1):6.
34. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC, Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach. IEEE Transactions on Knowledge and Data Engineering 2004, 16:1424-1440.
35. Terribilini, M., Sander, J.D., Lee, J.H., Zaback, P., Jernigan, R.L., Honavar, V., Dobbs, D. (2007) RNABindR: a server for analyzing and predicting RNA-binding sites in proteins. Nucleic Acids Research Advance Access published , doi:10.1093/nar/gkm294.
36. 黃曉琪(民98)。使用保留序列探勘技術於蛋白質序列中核醣核酸結合區之預測。元智大學資訊學院資訊管理學系碩士論文,未出版,中壢,台灣。指導教授:張百棧。


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top