( 您好!臺灣時間:2024/07/18 03:51
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


論文名稱(外文):Predicting DNA-binding Proteins using Disorder Information
指導教授(外文):Tien-Hao Chang
外文關鍵詞:DisorderDNA-binding Protein
  • 被引用被引用:0
  • 點閱點閱:164
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
Identifying DNA-binding proteins (DBPs) that play a crucial role in the regulation network is an important task of proteomics and genome annotation. DBPs have been shown to be recognizable by their structure and sequence characteristics. Methods based on structure information achieved better performance, while methods based on only sequence information have broader applications since they can be applied on proteins without crystallized structures. A compromising strategy is to use predicted structure information. In recent years, secondary structure and solvent accessibility are the two structure features that have reliable prediction packages and have been incorporated to predict DBPs with some successes.
There have been many biological evidences revealing that the process of DNA-binding probably leads to conformational changes of disorder-to-order, the so-called induced folding. Thus, the proposed method aims to include the predicted protein disorder information. This is the first study using such information for DBP prediction. A protein sequence in this study is described by four groups of features: a) amino acid composition, b) position specific scoring matrix (PSSM) obtained with PSI-BLAST c) proportion of order and disorder regions and d) secondary structure composition in the ordered region. The first two groups are widely used in previous studies; the third group is based on the predicted disorder information; and the last group combines both predicted secondary structure and disorder information. The experimental results show that the proposed method (81.3% F-measure) outperformed the two compared methods (64.1% and 76.7% F-measure). We also analyzed the Protein Data Bank database and found there were around 20% of DBPs undergo disorder-to-order transition upon binding DNA. These results encourage more efforts on exploiting protein disorder information in DBP prediction.
DNA-binding proteins participate in various cellular processes, and characterization of these proteins is of great importance. We proposed a novel DBP predictor with predicted secondary structure and disorder information. Its promising performance revealed the importance of disordered regions in DNA-binding proteins

摘要 4
誌謝 8
目錄 9
圖目錄 11
表目錄 12
第一章 緒論 13
第二章 相關研究 15
2.1 利用結構資訊為特徵的方法 15
2.2 利用序列資訊為特徵的方法 20
2.3 分類器介紹 23
2.3.1 支援向量機 23
2.3.2 可變式核心密度估計分類器 26
2.4 資料庫介紹 27
2.4.3 PDB 27
2.4.4 Swiss-Prot 27
2.4.5 DisProt 28
第三章 資料集與預測方法 30
3.1 資料集 30
3.2 特徵集 33
3.2.1 胺基酸組成 33
3.2.2 胺基酸分群 33
3.2.3 位置加群矩陣 34
3.2.4 蛋白質結構非穩定區段(disorder region) 34
3.2.5 二級結構(secondary structure) 37
3.2.6 可接觸表面積(accessible surface area, ASA) 38
第四章 實驗結果與討論分析 40
4.1 預測效能評估準則 40
4.2 DNA結合蛋白預測結果比較 43
4.3 評估各種特徵組合對預測效能的影響 45
4.4 非穩定區域資訊分析 46
第五章 結論與展望 49
5.1 結論 49
5.2 未來展望 49
參考文獻 50
[1]N. Luscombe, et al., An overview of the structures of protein-DNA complexes, Genome biology, vol. 1, 2000.
[2]A. Bullock and A. Fersht, Rescuing the function of mutant p53, Nature reviews cancer, vol. 1, pp. 68-76, 2001.
[3]S. Ahmad, et al., Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information, Bioinformatics, vol. 20, p. 477, 2004.
[4]M. Kumar, et al., Identification of DNA-binding proteins using support vector machines and evolutionary profiles, BMC bioinformatics, vol. 8, p. 463, 2007.
[5]S. Ahmad and A. Sarai, PSSM-based prediction of DNA binding sites in proteins, BMC bioinformatics, vol. 6, p. 33, 2005.
[6]X. Shao, et al., Predicting DNA-and RNA-binding proteins from sequences with kernel methods, Journal of theoretical biology, vol. 258, pp. 289-293, 2009.
[7]Y. Tsuchiya, et al., Structure-based prediction of DNA-binding sites on proteins using the empirical preference of electrostatic potential and the shape of molecular surfaces, Proteins Structure Function and Bioinformatics, vol. 55, pp. 885-894, 2004.
[8]維基百科討論:生物與醫學詞彙譯名表. Available: http://zh.wikipedia.org/wiki/Wikipedia_talk:%E7%94%9F%E7%89%A9%E8%88%87%E9%86%AB%E5%AD%B8%E8%A9%9E%E5%BD%99%E8%AD%AF%E5%90%8D%E8%A1%A8#.E8.9B.8B.E7.99.BD.E8.B3.AA.E7.B5.90.E6.A7.8B.E7.9A.84motif
[9]D. B. Bruce Alberts, Julian Lewis, Martin Raff, Keith Roberts, and James D. Watson, The DNA-binding helix-turn-helix motif, ed, 2002.
[10]D. B. Bruce Alberts, Julian Lewis, Martin Raff, Keith Roberts, and James D. Watson, Some helix-turn-helix DNA-binding proteins, ed, 2002.
[11]D. B. Bruce Alberts, Julian Lewis, Martin Raff, Keith Roberts, and James D. Watson, One of the most common protein-DNA interactions, ed, 2002.
[12]G. Nimrod, et al., Identification of DNA-binding proteins using structural, electrostatic and evolutionary features, Journal of molecular biology, vol. 387, pp. 1040-1053, 2009.
[13]A. Shrake and J. Rupley, Environment and exposure to solvent of protein atoms. Lysozyme and insulin* 1, Journal of molecular biology, vol. 79, pp. 351-364, 1973.
[14]K. Nadassy, et al., Structural Features of Protein- Nucleic Acid Recognition Sites, Biochemistry, vol. 38, 1999.
[15]K. Kumar, et al., DNA-Prot: Identification of DNA Binding Proteins from Protein Sequence Information using Random Forest, Journal of biomolecular structure & dynamics, vol. 26, p. 679, 2009.
[16]J. Shen, et al., Predicting protein?Vprotein interactions based only on sequences information, Proceedings of the National Academy of Sciences, vol. 104, p. 4337, 2007.
[17]C. Cortes and V. Vapnik, Support-vector networks, Machine learning, vol. 20, pp. 273-297, 1995.
[18]Y. Oyang, et al., Data classification with a relaxed model of variable kernel density estimation, 2005, pp. 2831-2836.
[19]M. Sickmeier, et al., DisProt: the database of disordered proteins, Nucleic acids research, 2006.
[20]X. Yu, et al., Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines, Journal of theoretical biology, vol. 240, pp. 175-184, 2006.
[21]M. Ashburner, et al., Gene Ontology: tool for the unification of biology, Nature genetics, vol. 25, pp. 25-29, 2000.
[22]Y. Cai and S. Lin, Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence, Biochim. Biophys. Acta, vol. 1648, pp. 127-133, 2003.
[23]M. Gao and J. Skolnick, DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions, Nucleic acids research, vol. 36, p. 3978, 2008.
[24]R. Langlois and H. Lu, Boosting the prediction and understanding of DNA-binding domains from sequence, Nucleic acids research, 2010.
[25]P. Wright and H. Dyson, Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm, Journal of molecular biology, vol. 293, pp. 321-331, 1999.
[26]A. Fink, Natively unfolded proteins, Current opinion in structural biology, vol. 15, pp. 35-41, 2005.
[27]C. Su, et al., iPDA: integrated protein disorder analyzer, Nucleic acids research, vol. 35, p. W465, 2007.
[28]C. Su, et al., Protein disorder prediction by condensed PSSM considering propensity for order or disorder, BMC bioinformatics, vol. 7, p. 319, 2006.
[29]Y. Kim, et al., Crystal structure of a yeast TBP/TATA-box complex, Nature, vol. 365, pp. 512-520, 1993.
[30]J. Kim, et al., Co-crystal structure of TBP recognizing the minor groove of a TATA element, Cell, vol. 56, pp. 549-561, 1989.
[31] TATA box-binding protein. Available: http://www.bmb.psu.edu/nixon/mdls/sigma/tbp.gif
[32]D. Chang, et al., Prediction of protein secondary structures with a novel kernel density estimation based classifier, BMC Research Notes, vol. 1, p. 51, 2008.
[33]D. Chang, et al., Real value prediction of protein solvent accessibility using enhanced PSSM features, BMC bioinformatics, vol. 9, p. S12, 2008.
[34]S. Ahmad, et al., Real value prediction of solvent accessibility from amino acid sequence, Proteins: Structure, Function, and Bioinformatics, vol. 50, pp. 629-635, 2003.
[35]C. Goutte, Note on free lunches and cross-validation, Neural Computation, vol. 9, pp. 1245-1249, 1997.
[36]P. Baldi, et al., Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics, vol. 16, p. 412, 2000.
[37]S. Ahmad and A. Sarai, Moment-based prediction of DNA-binding proteins, Journal of molecular biology, vol. 341, pp. 65-71, 2004.
[38]Y. Ofran, et al., Prediction of DNA-binding residues from sequence, Bioinformatics, vol. 23, p. i347, 2007
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top