跳到主要內容

臺灣博碩士論文加值系統

(34.204.180.223) 您好!臺灣時間:2021/08/06 00:03
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林以文
研究生(外文):Lin,Yi-Wen
論文名稱:整合蛋白質交互作用、功能域頻率及癌蛋白連接作用維度資料推導癌蛋白交互作用
論文名稱(外文):Deriving Cancer Protein Interaction By Integrating Protein-Protein Interaction, Domain Frequency And Cancer Linker Degree Data
指導教授:吳家樂
指導教授(外文):Ng, Ka-Lok
口試委員:王經篤曾明性吳家樂
口試委員(外文):Wang, Jing-DooTseng, Ming-HsengNg, Ka-Lok
口試日期:2012-07-12
學位類別:碩士
校院名稱:亞洲大學
系所名稱:生物與醫學資訊學系碩士班
學門:工程學門
學類:生醫工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:56
中文關鍵詞:癌症蛋白質蛋白質交互作用功能域頻率癌蛋白作用維度支持向量機隨機森林神經網絡樸素貝葉斯網路
外文關鍵詞:cancer proteinprotein-protein interactionsdomain frequencycancer linker degreeSVMrandom forestneural networknaive Bayes network
相關次數:
  • 被引用被引用:0
  • 點閱點閱:463
  • 評分評分:
  • 下載下載:10
  • 收藏至我的研究室書目清單書目收藏:0
摘要

眾所皆知,蛋白質與人類疾病是息息相關的。但是人們仍然不清楚它在疾病的運作機制中所扮演的確切角色。為了更深入理解這些蛋白質的交互作用

和功能,我們使用蛋白質交互作用(Protein-protein interaction 簡稱PPI)據去建構一組疾病蛋白的交互作用規則。

本研究的主要目的是利用數種類型的特徵去預測癌症蛋白質,即 domain-domain interactions (簡稱DDI)、功能域頻率權重分數(Weighed

Domain Frequency Score 簡稱 DFS)及癌蛋白連接作用維度(Cancer Linker Degree 簡稱 CLD)的資料。

此研究引入了一對一的交互作用模型來定量描述癌症蛋白特異性 DDI 的可確性,功能域頻率權重分數則被採用來測量功能域出現在癌症和非癌症蛋

白質的傾向。最後,癌蛋白連接作用維度被定義並用來衡量癌症和非癌症蛋白質交互作用的對象。然後,我們整合這些結果以便預測PPI 之中的癌症蛋

白質。

我們進行了 10 倍交叉驗證測試來檢驗支持向量機、隨機森林、神經網絡和樸素貝葉斯分類器的分類靈敏度。而我們的結果顯示預測出來最高的靈敏

度、特異性和F1 值分別為95%、86%和79%。接著我們又執行了特徵值的正規化、預測單一特徵數據的結果,並與其他相似的研究比較預測性能上的

優點和缺點。研究結果發現我們採用的方法其性能較佳,這也表明了本研究的可行性。

研 究 結 果 建 置 成 網 頁 , 供 使 用 者 查 詢 , 網 址 為 : http://ppi.bioinfo.asia.edu.tw/TsgOcgppi/。
ABSTRACT

It is known that many proteins are associated with human disease. It is very often their precise functional

role in disease pathogenesis remains unclear. A strategy to gain a better understanding into the interaction and

function of these proteins is to make use of the protein-protein interaction (PPI)data, and construct a set of

interaction rules for disease proteins.

The main purpose of this research is to predict cancer proteins by integrating several types of features, that

is, domain-domain interactions (DDI), weighed domain frequency score (DFS) and the cancer linker degree (CLD) data.

An one-to-one interaction model was introduced to quantify the likelihood of cancer-specific DDI. Weighed

domain frequency score is adopted to measure the propensity of domain occurrence in cancer and non-cancer proteins.

Finally, cancer linker degree is defined to gauge cancer and non-cancer proteins’ interaction partners. Then, these

results are integrated to predict PPI among cancer proteins.

A 10-fold cross-validation test is performed to test the classification sensitivity of the SVM, random forest,

neural network and Naive Bayes classifier. The prediction results show that the highest sensitivity, specificity, and

F1 values are 95%, 86% and 79% respectively. Then we implement the normalization of the feature values, predict the

results of the single feature data and compare the advantages and disadvantages of the performance with other research

work. It is found that the performance of the present approach is better which suggest the feasibility of this study.

A website has been set up which allow users for data query and make predictions,

see http://ppi.bioinfo.asia.edu.tw/TsgOcgppi/.
TABLE OF CONTENTS
TITLE Page
摘要………………………………………………………………………………………2
ABSTRACT ..................................................................................................... 3
致謝………………………………………………………………………………………4
TABLE OF CONTENTS................................................................................... 5
LIST OF TABLES ............................................................................................ 7
LIST OF FIGURES .......................................................................................... 8
1. INTRODUCTION ......................................................................................... 9
1.1 Background ....................................................................................... 9
1.2 Objective .......................................................................................... 11
2. MATERIALS AND METHODS .................................................................. 12
2.1 The System flowchart ..................................................................... 12
2.2 Database .......................................................................................... 14
2.2.1 The Tumor Associated Gene Database ................................ 14
2.2.2 National Yang-Ming University Data Set ............................... 14
2.2.3 Memorial Sloan-Kettering Cancer Center(MSKCC) ............. 14
2.2.4 Biological General Repository for Interaction Datasets
(BioGRID) ....................................................................................... 15
2.2.5 The Gene Ontology Database ............................................... 15
2.2.6 Protein Families (PFam) ........................................................ 16
2.2.7 Swiss-Prot ............................................................................... 16
2.2.8 InterPro ................................................................................... 16
2.3 The Platform and Tools for Development ..................................... 17
2.4 Source of data ................................................................................. 17
2.5 Data format ...................................................................................... 17
2.6 Feature types .................................................................................. 22
2.6.1 One-to-one interaction model ............................................... 22
2.6.2 Weighed domain frequency scores ...................................... 26
2.6.3 Cancer linker degree .............................................................. 28
2.6.4 Normalization of feature data ................................................ 30
2.6.5 The four machine learning algorithms and statistical
measures ....................................................................................... 31
6
3. RESULTS .................................................................................................. 34
3.1 Optimal parameter setting ............................................................. 34
3.2 The results of the assessment for the original data .................... 36
3.3 The results of the assessment for the normalized data .............. 37
3.4 The results of the assessment for single feature data ................ 40
3.5 Comparison with other studies ..................................................... 47
4. DISCUSSION ............................................................................................. 51
5. REFERENCES .......................................................................................... 52
REFERENCES

[1] R. Aragues, et al., "Predicting cancer involvement of genes from heterogeneous data," BMC Bioinformatics, vol. 9, p. 172, 2008.

[2] H.‐H. Chan, "Identification of novel tumor‐associated gene (TAG) by bioinformatics analysis," 成功大學分子醫學研究所碩士論文, 2006.

[3] L. Hakes, et al., "Protein‐protein interaction networks and biology — what's the connection," Nature Biotechnology, vol. 46, pp. 69‐72, 2008.

[4] C. v. Mering, et al., "Comparative assessment of large‐scale data sets of protein–protein interactions," Nature, vol. 417, pp. 399‐403, 2002.

[5] S.‐B. Benjamin and B. Alex, "Protein interactions in human genetic diseases," Genome Biology, vol. 9, p. R9, 2008.

[6] R. Sharan, "From the Cover: Conserved patterns of protein interaction in multiple species," Proceedings of the National Academy of Sciences, vol. 102, pp. 1974‐1979, 2005.

[7] C. C. Chen, et al., "PPISearch: a web server for searching homologous protein‐protein interactions across multiple species," Nucleic Acids Research, vol. 37, pp. W369‐W375, 2009.

[8] M. Sˇikic´, et al., "Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests," PLoS Computational Biology, vol. 5, p. e1000278, 2009.

[9] J. R. Bock and D. A. Gough, "Predicting protein–protein interactions from primary structure," Bioinformatics, vol. 17, pp. 455‐460, 2001.

[10] R. Jansen, "A Bayesian Networks Approach for Predicting Protein‐Protein Interactions from Genomic Data," Science, vol. 302, pp. 449‐453, 2003.

[11] T. t. Soong, et al., "Physical protein‐protein interactions predicted from microarrays," Bioinformatics, vol. 24, pp. 2608‐2614, 2008.

[12] D. R. Rhodes, et al., "Probabilistic model of the human protein‐protein interaction network," Nature Biotechnology, vol. 23, pp. 951‐959, 2005.

[13] M. Oti, "Predicting disease genes using protein‐protein interactions," Journal of Medical Genetics, vol. 43, pp. 691‐698, 2006.

[14] J. Xu and Y. Li, "Discovering disease‐genes by topological features in human protein‐protein interaction network," Bioinformatics, vol. 22, pp. 2800‐2805, 2006.

[15] I. W. Taylor, et al., "Dynamic modularity in protein interaction networks predicts breast cancer outcome," Nature Biotechnology, vol. 27, pp. 199‐204, 2009.

[16] J. Xia, et al., "Do cancer proteins really interact strongly in the human protein–protein interaction network?," Computational Biology and Chemistry, vol. 35, pp. 121‐125, 2011.

[17] X.‐w. Chen, et al., "Protein Function Assignment through Mining Cross‐Species Protein‐Protein Interactions," PLoS ONE, vol. 3, p. e1562, 2008.

[18] W. Gui, "A study of protein function‐function correlation," 亞洲大學生物與醫學資訊學系碩士論文, 2008.

[19] E. M. Marcotte, et al., "A combined algorithm for genome‐wide prediction of protein function," Nature, vol. 402, pp. 83‐86, 1999.

[20] Z. Itzhaki, et al., "Evolutionary conservation of domain‐domain interactions," Genome Biology, vol. 7, p. R125, 2006.

[21] M. Liu, et al., "Knowledge‐guided inference of domain‐domain interactions from incomplete protein‐protein interaction networks," Bioinformatics, vol. 25, pp. 2492‐2499, 2009.

[22] K. S. Guimarães, et al., "Predicting domain‐domain interactions using a parsimony approach," Genome Biology, vol. 7, p. R104, 2006.

[23] B. J. Breitkreutz, et al., "The BioGRID Interaction Database: 2008 update," Nucleic Acids Research, vol. 36, pp. D637‐D640, 2007.

[24] A. Bateman, et al., "The Pfam protein families database," Nucleic Acids Research, vol. 32, pp. D138‐D141, 2004.

[25] G. Kar, et al., "Human Cancer Protein‐Protein Interaction Network: a Structural Perspective," PLoS Computational Biology, vol. 5, p. e1000601, 2009.

[26] P. F. Jonsson and P. A. Bates, "Global topological features of cancer proteins in the human interactome," Bioinformatics, vol. 22, pp. 2291‐2297, 2006.

[27] J. Xiong, et al., "Protein‐protein Interaction Reveals Synergistic Discrimination of Cancer Phenotype," Libertas Academica vol. 9, pp. 61–66, 2010.

[28] A. Platzer, et al., "Characterization of protein‐interaction networks in tumors," BMC Bioinformatics, vol. 8, p. 224, 2007.

[29] G. Wu, et al., "A human functional protein interaction network and its application to cancer data analysis," Genome Biology, vol. 11, p. R53, 2010.

[30] Y.‐L. Lee, et al., "Investigating Cancer‐related Proteins Specific Domain Interactions and Differential Protein Interactions caused by Alternative Splicing," IEEE 11th International Conference on Bioinformatics and
Bioengineering (BIBE), vol. 1, p. 33‐38, 2011.

[31] Y.‐L. Lee, "In silico study of oncoproteins and tumor suppressor protein Domain‐Domain interactions," 亞洲大學生物與醫學資訊學系碩士論文, 2009.

[32] W.‐C. Chiang, "Deriving Cancerous Protein Specific Domain‐Domain Interactions from Protein‐Protein Interactions," 亞洲大學生物與醫學資訊學系碩士論文, 2011.

[33] H. Ruffner, et al., "Human protein–protein interaction networks and the value for drug discovery," Drug Discovery Today, vol. 12, pp. 709‐716, 2007.

[34] L.‐H. Chu and B.‐S. Chen, "Construction of a cancer‐perturbed protein‐protein interaction network for discovery of apoptosis drug targets," BMC Systems Biology, vol. 2, p. 56, 2008.

[35] B. Titz, et al., "What do we learn from high‐throughput protein interaction data," Future Drugs, vol. 1, pp. 111‐121, 2004.

[36] L. Breiman, "Random Forests," Computer Science, vol. 45, pp. 5‐32, 2001.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top