(3.237.97.64) 您好!臺灣時間:2021/03/03 01:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:邱莉媛
研究生(外文):Li-Yuan Chiu
論文名稱:應用機器學習方法預測核糖核酸與蛋白質結合位置
論文名稱(外文):Applying Machine Learning on Prediction of RNA-Binding Residues in Proteins
指導教授:黃乾綱黃乾綱引用關係
指導教授(外文):Chien-Kang Huang
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:工程科學及海洋工程學研究所
學門:工程學門
學類:綜合工程學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:英文
論文頁數:62
中文關鍵詞:機器學習支援向量機核糖核酸與蛋白質結合位置預測
外文關鍵詞:Machine LearningSupport Vector MachineRNA Binding Residues Prediction
相關次數:
  • 被引用被引用:0
  • 點閱點閱:194
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
與核糖核酸(RNA)結合的蛋白質在核糖核酸中序列的辨識上占有很重要的位置,因為這些資訊是去氧核糖核酸(DNA)的作用來源。為了符合各種功能的需求,與核糖核酸結合的蛋白質是由許多重覆的結合區段組成,而這些區段各有其結構上的位置以提供不同的功能。應用機器學習方法於預測核糖核酸與蛋白質結合位置,可以協助分子生物研究人員快速過濾可能與RNA作用位置及機制。
ProteRNA為本論文所提出的預測方法,融合了支援向量機(SVM)與WildSpan蛋白質序列探勘兩種工具的結果,其中SVM利用PSSM及蛋白質二級結構資訊預測,而WildSpan則利用序列保留特質做預測。單純使用SVM方法的預測效能其F-score為0.5127,合併WildSpan 的預測結果F-score提升至 0.5362,相較目前其他預測方法表現較好。進行獨立測試時,ProteRNA可達到整體精確度89.55 %、Matthew`s 相關係數(MCC) 0.2686、及F-score 0.3185,超越其他現有的線上RNA與蛋白質結合位置預測服務。

RNA-binding proteins (RBPs) are vital for recognition sequences of ribonucleic acids, which is the genetic material that is derived from the DNA. For satisfying diverse functional requirements, RNA binding proteins are composed of multiple repeated blocks of RNA-binding domains presented in various structural arrangements to provide versatile functions. The ability to predict computationally RNA-binding residues in a RNA-binding protein can help biologists to have clues on site-directed mutagenesis in wet-lab experiments. “ProteRNA” is the proposed prediction framework in this thesis, combining Support Vector Machine (SVM) and WildSpan for identifying RNA-interacting residues in a RNA-binding protein. SVM utilizes PSSM and protein secondary structure information to predict, while WildSpan bases on conserved domain information. The performances of SVM predictor are F-score of 0.5127; however, the performances of the WildSpan hybrid predictor achieve F-score of 0.5362. In the independent testing dataset, ProteRNA has been able to deliver overall accuracy of 89.55 %, MCC of 0.2686, and F-score of 0.3185. ProteRNA surpasses the other web servers no matter in terms of accuracy, MCC, or F-score.

Table of Contents
致謝 I
摘要 II
Abstract III
Table of Contents IV
List of Figures VI
List of Tables VII
Chapter 1 Introduction 1
1-1 Background 1
1-2 Motivation 4
1-3 Summary of Paper Organization 5
Chapter 2 Literature Review 6
2-1 Central Dogma 6
2-2 The Attributes of Amino Acid 9
2-3 Position-Specific Scoring Matrix 11
2-4 Secondary Structure Information 12
2-5 Classifier - Support Vector Machines 12
2-6 WildSpan 18
2-7 Related Works 18
Chapter 3 Method 23
3-1 Problem Definition 23
3-2 Data Set 23
3-3 Performance Measure 25
3-4 Feature Selection 26
3-5 Normalization 28
3-6 Single Predictor Model 30
3-7 Hybrid Model 33
3-8 System Architecture 34
Chapter 4 Results and Discussion 38
4-1 Distinct Normalization Results 38
4-2 Performance of Single Predictor 39
4-3 Performance of Hybrid Model 43
4-4 Comparison with Other Approaches 47
4-5 Independent Test and Comparison with Other Approaches 49
4-6 Independent Test Case Discussion 51
Chapter 5 Conclusion and Further Directions 57
5-1 Conclusion 57
5-2 Further Directions 58
References 60

References
1.Chen, Y. and G. Varani, Protein families and RNA recognition. FEBS Journal, 2005. 272(9): p. 2088-2097.
2.Lunde, B., C. Moore, and G. Varani, RNA-binding proteins: modular design for efficient function. Nature Reviews Molecular Cell Biology, 2007. 8(6): p. 479-490.
3.Boeckmann, B., et al., The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic acids research, 2003. 31(1): p. 365.
4.Berman, H., et al., The protein data bank. Acta Crystallographica Section D: Biological Crystallography, 2002. 58(6): p. 899-907.
5.Cheng, C.W., et al., Predicting RNA-binding sites of proteins using support vector machines and evolutionary information. BMC Bioinformatics, 2008. 9 Suppl 12: p. S6.
6.Perez-Cano, L. and J. Fernandez-Recio, Optimal protein-RNA area, OPRA: a propensity-based method to identify RNA-binding sites on proteins. Proteins, 2009. 78(1): p. 25-35.
7.Caragea C, S.J., Dobbs D, Honavar V, Assessing the Performance of Macromolecular Sequence Classifiers. , in IEEE 7th International Symposium on Bioinformatics and Bioengineering. 2007. p. 320-326.
8.Vapnik, V., The nature of statistical learning theory. 2000: Springer Verlag.
9.Hsu, C., WildSpan: Mining Discontinuous Motif in Protein Sequences, in Department of Computer Science and Engineering. 2007, Yuan Ze University.
10.Crick, F., Central dogma of molecular biology. Nature, 1970. 227(5258): p. 561-563.
11.Betts, M. and R. Russell, Amino-Acid Properties and Consequences of Substitutions. Bioinformatics for geneticists: a bioinformatics primer for the analysis of genetic data, 2007: p. 311.
12.Shazman, S. and Y. Mandel-Gutfreund, Classifying RNA-binding proteins based on electrostatic properties. PLoS Comput Biol, 2008. 4(8): p. e1000146.
13.Shen, J., et al., Predicting protein–protein interactions based only on sequences information. Proceedings of the National Academy of Sciences, 2007. 104(11): p. 4337.
14.Altschul, S., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research, 1997. 25(17): p. 3389.
15.Bryson, K., et al., Protein structure prediction servers at University College London. Nucleic acids research, 2005. 33(Web Server Issue): p. W36.
16.Cortes, C. and V. Vapnik, Support-vector networks. Machine learning, 1995. 20(3): p. 273-297.
17.Chang, C. and C. Lin, LIBSVM: a library for support vector machines, 2001. Software available at http://www. csie. ntu. edu. tw/cjlin/libsvm, 2001.
18.Hsu, C., et al., Efficient discovery of structural motifs from protein sequences with combination of flexible intra-and inter-block gap constraints. Advances in Knowledge Discovery and Data Mining: p. 530-539.
19.Jeong, E., I. Chung, and S. Miyano, A neural network method for identification of RNA-interacting residues in protein. GENOME INFORMATICS SERIES, 2004: p. 105-116.
20.Jeong, E. and S. Miyano, A weighted profile based method for protein-RNA interacting residue prediction. Lecture notes in computer science, 2006. 3939: p. 123.
21.Wang, L. and S. Brown. Prediction of RNA-binding residues in protein sequences using support vector machines. 2006.
22.Wang, L. and S. Brown, BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic acids research, 2006. 34(Web Server issue): p. W243.
23.Kim, O., K. Yura, and N. Go, Amino acid residue doublet propensity in the protein-RNA interface and its application to RNA interface prediction. Nucleic acids research, 2006.
24.Terribilini, M., et al., Prediction of RNA binding sites in proteins from amino acid sequence. Rna, 2006. 12(8): p. 1450.
25.Tong, J., P. Jiang, and Z. Lu, RISP: A web-based server for prediction of RNA-binding sites in proteins. Computer methods and programs in biomedicine, 2008. 90(2): p. 148-153.
26.Wang, Y., et al., PRINTR: prediction of RNA binding sites in proteins using SVM and profiles. Amino Acids, 2008. 35(2): p. 295-302.
27.Kumar, M., M.M. Gromiha, and G.P. Raghava, Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins, 2008. 71(1): p. 189-94.
28.Spriggs, R.V., et al., Protein function annotation from sequence: prediction of residues interacting with RNA. Bioinformatics, 2009. 25(12): p. 1492-7.
29.Maetschke, S.R. and Z. Yuan, Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinformatics, 2009. 10: p. 341.
30.Dondoshansky, I., Blastclust (NCBI Software Development Toolkit). NCBI, Bethesda, Md, 2002.
31.Wang, G. and R. Dunbrack Jr, PISCES: a protein sequence culling server. Bioinformatics, 2003. 19(12): p. 1589.
32.Terribilini, M., et al., RNABindR: a server for analyzing and predicting RNA-binding sites in proteins. Nucleic Acids Res, 2007. 35(Web Server issue): p. W578-84.
33.Van Rijsbergen, C., Information retrieval, chapter 7. Butterworths, London, 1979. 2: p. 111–143.
34.Perez-Cano, L. and J. Fernandez-Recio, Optimal protein-RNA area, OPRA: A propensity-based method to identify RNA-binding sites on proteins. Proteins: Structure, Function, and Bioinformatics, 2009. 78(1): p. 25-35.
35.FAUCHERE, J., et al., Amino acid side chain parameters for correlation studies in biology and pharmacology. International Journal of Peptide and Protein Research, 2009. 32(4): p. 269-278.
36.Larranaga, P., et al., Machine learning in bioinformatics. Briefings in bioinformatics, 2006.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 林英彥 (2008) “農地”與“農舍”能否共生?,土地問題研究季刊 ,7(4): 70-72。
2. 李俊豪 (2006) 戰後台灣農村勞動遷徙現象:以小龍村為例,人口學刊,31:69-103。
3. 吳功顯 (2003) 台灣農業土地利用變遷之探討,華岡農科學報,12: 47-67。
4. 段兆麟,2005,「體驗」是休閒農業成功的關鍵,農業世界,257,72-73。
5. 吳惠卿、陳正輝 (2003) 集集鎮番石榴產銷班高齡農民繼續營農影響因素之研究, 農業經營管理年刊,9: 108-131。
6. 李元和 (2004) 台灣稻米產銷政策之檢討與基本改革措施效益之分析,農業經濟叢刊,9(2): 79-111。
7. 林國慶 and 吳珮瑛 (2002) 台灣加入WTO農業短中長期補貼制度調整架構之建立與農民偏好之調查分析,臺灣土地金融季刊 38(3): 45-75。
8. 毛育剛 (2006) 台灣農業發展重要轉捩點的政策含義,土地問題研究季刊,5(1): 2-13。
9. 毛育剛 (2002) 台灣農地保護政策之演變,土地問題研究季刊,1(4): 11-23。
10. 黃國敏、周宗德,2006,苗栗縣文化創意產業的發展及其行銷策略之研究,中華行政學報,3期,7-23。
11. 陳美芬,2005,發掘農業文化的魅力休閒農業創意產品的研發,農政與農情,151期,61-67。
12. 段兆麟,2007,台灣休閒農業發展的回顧與未來發展策略,農政與農情,177期,64-70。
13. 翁培文、蔡博文 (2006) 臺灣堡圖面資料簡括化初探-以水、旱田面積為例, 地圖,16: 69-86。
14. 陳昭郎 (1990) 影響農民對農業法規認知之因素,農業推廣學報,8: 29-80。
15. 葛湘瑋 (2004) 應用線性混合效果模式於建立多層縱向資料的模式之實例研究,教育與心理研究,27(2):399-419。
 
系統版面圖檔 系統版面圖檔