跳到主要內容

臺灣博碩士論文加值系統

(44.200.140.218) 您好!臺灣時間:2024/07/18 03:51
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:薛正豪
研究生(外文):Cheng-HaoHsueh
論文名稱:以非穩定結構特徵預測DNA結合蛋白
論文名稱(外文):Predicting DNA-binding Proteins using Disorder Information
指導教授:張天豪
指導教授(外文):Tien-Hao Chang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電機工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:中文
論文頁數:52
中文關鍵詞:DNA結合蛋白非穩定結構
外文關鍵詞:DisorderDNA-binding Protein
相關次數:
  • 被引用被引用:0
  • 點閱點閱:164
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
DNA結合蛋白在生物體中基因調控與基因功能註解扮演很重要的角色,利用計算的方式尋找DNA結合蛋白可以減少實驗的支出以及增進DNA結合蛋白的發現效率,所以成為熱門的研究目標。過去的研究顯示,DNA結合蛋白可以透過結構資訊以及序列資訊來加以辨識,利用結構資訊的方法可以得到較高的準確度,而利用序列資訊的方法可以有較廣泛的適用性。折衷的方法即是利用序列資訊預測二級結構,本研究採用預測的方法得到二級結構、蛋白質可接觸表面積兩種與結構有關的特徵,先前的研究顯示這兩個特徵對於DNA結合蛋白的預測是有幫助的。
有研究指出,DNA與與蛋白質結合會導致蛋白質中非穩定區域的結構轉為穩定,因為有此特點,所以本研究首度加入了非穩定區段的資訊作為DNA結合蛋白預測的特徵,我們提出的特徵集使用了位置加權矩陣、二級結構資訊以及表面積資訊,我們以RVKDE為分類工具,使用前述的特徵集可以到81.3%(F-measure)的準確率。最後,我們分析了PDB資料庫以及我們資料集中陽性集,發現有DNA結合蛋白結合前後會出現非穩定至穩定的結構轉換比例約是20%,這顯示非穩定區域的資與DNA結合蛋白的確有關。
DNA結合蛋白對於生物體是不可或缺得重要一環,辨識DNA結合蛋白有著很大的重要性,我們提出了一個結合序列資訊以及結構資訊的方法,並且加上非穩定區域特徵,可得到不錯的預測率,並且以實驗證明非穩定區段對於DNA結合蛋白的影響。
Background:
Identifying DNA-binding proteins (DBPs) that play a crucial role in the regulation network is an important task of proteomics and genome annotation. DBPs have been shown to be recognizable by their structure and sequence characteristics. Methods based on structure information achieved better performance, while methods based on only sequence information have broader applications since they can be applied on proteins without crystallized structures. A compromising strategy is to use predicted structure information. In recent years, secondary structure and solvent accessibility are the two structure features that have reliable prediction packages and have been incorporated to predict DBPs with some successes.
Results:
There have been many biological evidences revealing that the process of DNA-binding probably leads to conformational changes of disorder-to-order, the so-called induced folding. Thus, the proposed method aims to include the predicted protein disorder information. This is the first study using such information for DBP prediction. A protein sequence in this study is described by four groups of features: a) amino acid composition, b) position specific scoring matrix (PSSM) obtained with PSI-BLAST c) proportion of order and disorder regions and d) secondary structure composition in the ordered region. The first two groups are widely used in previous studies; the third group is based on the predicted disorder information; and the last group combines both predicted secondary structure and disorder information. The experimental results show that the proposed method (81.3% F-measure) outperformed the two compared methods (64.1% and 76.7% F-measure). We also analyzed the Protein Data Bank database and found there were around 20% of DBPs undergo disorder-to-order transition upon binding DNA. These results encourage more efforts on exploiting protein disorder information in DBP prediction.
Conclusions:
DNA-binding proteins participate in various cellular processes, and characterization of these proteins is of great importance. We proposed a novel DBP predictor with predicted secondary structure and disorder information. Its promising performance revealed the importance of disordered regions in DNA-binding proteins

摘要 4
ABSTRACT 6
誌謝 8
目錄 9
圖目錄 11
表目錄 12
第一章 緒論 13
第二章 相關研究 15
2.1 利用結構資訊為特徵的方法 15
2.2 利用序列資訊為特徵的方法 20
2.3 分類器介紹 23
2.3.1 支援向量機 23
2.3.2 可變式核心密度估計分類器 26
2.4 資料庫介紹 27
2.4.3 PDB 27
2.4.4 Swiss-Prot 27
2.4.5 DisProt 28
第三章 資料集與預測方法 30
3.1 資料集 30
3.2 特徵集 33
3.2.1 胺基酸組成 33
3.2.2 胺基酸分群 33
3.2.3 位置加群矩陣 34
3.2.4 蛋白質結構非穩定區段(disorder region) 34
3.2.5 二級結構(secondary structure) 37
3.2.6 可接觸表面積(accessible surface area, ASA) 38
第四章 實驗結果與討論分析 40
4.1 預測效能評估準則 40
4.2 DNA結合蛋白預測結果比較 43
4.3 評估各種特徵組合對預測效能的影響 45
4.4 非穩定區域資訊分析 46
第五章 結論與展望 49
5.1 結論 49
5.2 未來展望 49
參考文獻 50
[1]N. Luscombe, et al., An overview of the structures of protein-DNA complexes, Genome biology, vol. 1, 2000.
[2]A. Bullock and A. Fersht, Rescuing the function of mutant p53, Nature reviews cancer, vol. 1, pp. 68-76, 2001.
[3]S. Ahmad, et al., Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information, Bioinformatics, vol. 20, p. 477, 2004.
[4]M. Kumar, et al., Identification of DNA-binding proteins using support vector machines and evolutionary profiles, BMC bioinformatics, vol. 8, p. 463, 2007.
[5]S. Ahmad and A. Sarai, PSSM-based prediction of DNA binding sites in proteins, BMC bioinformatics, vol. 6, p. 33, 2005.
[6]X. Shao, et al., Predicting DNA-and RNA-binding proteins from sequences with kernel methods, Journal of theoretical biology, vol. 258, pp. 289-293, 2009.
[7]Y. Tsuchiya, et al., Structure-based prediction of DNA-binding sites on proteins using the empirical preference of electrostatic potential and the shape of molecular surfaces, Proteins Structure Function and Bioinformatics, vol. 55, pp. 885-894, 2004.
[8]維基百科討論:生物與醫學詞彙譯名表. Available: http://zh.wikipedia.org/wiki/Wikipedia_talk:%E7%94%9F%E7%89%A9%E8%88%87%E9%86%AB%E5%AD%B8%E8%A9%9E%E5%BD%99%E8%AD%AF%E5%90%8D%E8%A1%A8#.E8.9B.8B.E7.99.BD.E8.B3.AA.E7.B5.90.E6.A7.8B.E7.9A.84motif
[9]D. B. Bruce Alberts, Julian Lewis, Martin Raff, Keith Roberts, and James D. Watson, The DNA-binding helix-turn-helix motif, ed, 2002.
[10]D. B. Bruce Alberts, Julian Lewis, Martin Raff, Keith Roberts, and James D. Watson, Some helix-turn-helix DNA-binding proteins, ed, 2002.
[11]D. B. Bruce Alberts, Julian Lewis, Martin Raff, Keith Roberts, and James D. Watson, One of the most common protein-DNA interactions, ed, 2002.
[12]G. Nimrod, et al., Identification of DNA-binding proteins using structural, electrostatic and evolutionary features, Journal of molecular biology, vol. 387, pp. 1040-1053, 2009.
[13]A. Shrake and J. Rupley, Environment and exposure to solvent of protein atoms. Lysozyme and insulin* 1, Journal of molecular biology, vol. 79, pp. 351-364, 1973.
[14]K. Nadassy, et al., Structural Features of Protein- Nucleic Acid Recognition Sites, Biochemistry, vol. 38, 1999.
[15]K. Kumar, et al., DNA-Prot: Identification of DNA Binding Proteins from Protein Sequence Information using Random Forest, Journal of biomolecular structure & dynamics, vol. 26, p. 679, 2009.
[16]J. Shen, et al., Predicting protein?Vprotein interactions based only on sequences information, Proceedings of the National Academy of Sciences, vol. 104, p. 4337, 2007.
[17]C. Cortes and V. Vapnik, Support-vector networks, Machine learning, vol. 20, pp. 273-297, 1995.
[18]Y. Oyang, et al., Data classification with a relaxed model of variable kernel density estimation, 2005, pp. 2831-2836.
[19]M. Sickmeier, et al., DisProt: the database of disordered proteins, Nucleic acids research, 2006.
[20]X. Yu, et al., Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines, Journal of theoretical biology, vol. 240, pp. 175-184, 2006.
[21]M. Ashburner, et al., Gene Ontology: tool for the unification of biology, Nature genetics, vol. 25, pp. 25-29, 2000.
[22]Y. Cai and S. Lin, Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence, Biochim. Biophys. Acta, vol. 1648, pp. 127-133, 2003.
[23]M. Gao and J. Skolnick, DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions, Nucleic acids research, vol. 36, p. 3978, 2008.
[24]R. Langlois and H. Lu, Boosting the prediction and understanding of DNA-binding domains from sequence, Nucleic acids research, 2010.
[25]P. Wright and H. Dyson, Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm, Journal of molecular biology, vol. 293, pp. 321-331, 1999.
[26]A. Fink, Natively unfolded proteins, Current opinion in structural biology, vol. 15, pp. 35-41, 2005.
[27]C. Su, et al., iPDA: integrated protein disorder analyzer, Nucleic acids research, vol. 35, p. W465, 2007.
[28]C. Su, et al., Protein disorder prediction by condensed PSSM considering propensity for order or disorder, BMC bioinformatics, vol. 7, p. 319, 2006.
[29]Y. Kim, et al., Crystal structure of a yeast TBP/TATA-box complex, Nature, vol. 365, pp. 512-520, 1993.
[30]J. Kim, et al., Co-crystal structure of TBP recognizing the minor groove of a TATA element, Cell, vol. 56, pp. 549-561, 1989.
[31] TATA box-binding protein. Available: http://www.bmb.psu.edu/nixon/mdls/sigma/tbp.gif
[32]D. Chang, et al., Prediction of protein secondary structures with a novel kernel density estimation based classifier, BMC Research Notes, vol. 1, p. 51, 2008.
[33]D. Chang, et al., Real value prediction of protein solvent accessibility using enhanced PSSM features, BMC bioinformatics, vol. 9, p. S12, 2008.
[34]S. Ahmad, et al., Real value prediction of solvent accessibility from amino acid sequence, Proteins: Structure, Function, and Bioinformatics, vol. 50, pp. 629-635, 2003.
[35]C. Goutte, Note on free lunches and cross-validation, Neural Computation, vol. 9, pp. 1245-1249, 1997.
[36]P. Baldi, et al., Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics, vol. 16, p. 412, 2000.
[37]S. Ahmad and A. Sarai, Moment-based prediction of DNA-binding proteins, Journal of molecular biology, vol. 341, pp. 65-71, 2004.
[38]Y. Ofran, et al., Prediction of DNA-binding residues from sequence, Bioinformatics, vol. 23, p. i347, 2007
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top