(35.175.212.130) 您好!臺灣時間:2021/05/18 04:35
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:李瑋峻
研究生(外文):Wei-Jyun Li
論文名稱:利用整合式系統預測蛋白質細胞內定位
論文名稱(外文):Predicting Protein Subcellular Localization Using Integrative System
指導教授:阮議聰
指導教授(外文):Eric Y. T. Juan
學位類別:碩士
校院名稱:國立臺灣海洋大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:140
中文關鍵詞:蛋白質細胞內定位演化式運算特徵抽取胺基酸索引最佳化
外文關鍵詞:MG-PSO-DSPSL-PR-CPR (Protein Subcellular Localization PredictoR and Characteristic ProvideR)AAwindowC4.5
相關次數:
  • 被引用被引用:0
  • 點閱點閱:96
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
預測蛋白質細胞內定位是近年來相當熱門的課題,因為瞭解蛋白質在細胞中的位置可以幫助預測蛋白質的功能性、基因組註譯並幫助藥物設計。而透過實驗來分析蛋白質在細胞中的位置是一項相當耗費資源以及時間的工作,所以用計算機分析資料庫所提供的大量蛋白質資料預測蛋白質細胞內定位已經相當普遍。但由於蛋白質的結構相當複雜,使得要挑選出適合進行蛋白質分類預測的特徵非常的不容易,因此許多較新提出的預測系統以提高蛋白質細胞內定位的預測準確性為目標進行系統設計,多採用以多項特徵為基礎的蛋白質描述方法與較為複雜的混合型分類系統預測蛋白質在細胞中的位置。雖然這些系統的預測準確性相當的高,但在這些系統中,較少有提供可供分析的蛋白質特性與分類依據的系統。因此,在本論文提出一套能提供更多蛋白質特徵以進行分析的預測系統,並命名為PSL-PR-CPR (Protein Subcellular Localization PredictoR and Characteristic ProvideR) 。
在PSL-PR-CPR系統中,利用胺基酸索引建立簡單易懂且易於分析的蛋白質描述方法AAwindow,再以AAwindow將蛋白質描述成特徵向量,透過演化式運算MG-PSO-DS進行特徵抽取,以挑選適合分類器C4.5預測蛋白質細胞內定位的特徵組合,並同時調整C4.5之參數組合幫C4.5的預測表現進行最佳化,最後得到分類預測準確性較高的分類預測法則。在分類預測法則建立完成後,PSL-PR-CPR會列出C4.5的分類法則並整理出可能有助於蛋白質分析的特徵,再透過AAwindow較易分析的特性區域,提供更多的分析參考資訊。為了測試PSL-PR-CPR的預測表現,本論文最後將與Mycobacterial PSL predictor、Gpos-PLoc、CELLO及LocateP等預測系統進行預測表現與系統比較,使用的資料集合是Mycobacterial PSL predictor的研究中整理蒐集的852筆分歧桿菌蛋白質資料以及Gpos-PLoc研究中整理蒐集的452筆革蘭氏陽性菌蛋白質資料,合計兩個測試集合分別以五次及十次交叉測試法測試PSL-PR-CPR的預測能力,然後列出C4.5的分類法則範例、重要特徵及在蛋白質上顯示出特性區域的位置。
The prediction of protein subcellular localization (PSL) has become a popular field in recent years because it can help protein function prediction and genome annotation, and thus aid the drug design. However, the experimental methods for analyzing PSL are often expensive and time-consuming tasks. Therefore, the computational prediction of PSL, with the use of information in databases, has become a vibrant field of study. Nevertheless, it is still a tough task to extract suitable features from proteins for accurate prediction of PSL due to the complex structures of proteins. Consequently, for improving prediction performance on PSL problem, several modern PSL prediction systems apply multi-feature based protein descriptors and adopt hybrid complex prediction systems to classify and predict PSL. Even though, these systems possess outstanding prediction performance, few of them provide protein characteristics and bases of classification for further analysis. Therefore, in this thesis, a PSL prediction system, PSL-PR-CPR (Protein Subcellular Localization PredictoR and Characteristic ProvideR), which aims to provide more protein characteristics for analysis, is proposed.
In PSL-PR-CPR system, proteins are encoded into feature vectors by using a protein descriptor, AAwindow, which uses Amino Acid Index (AAI) to describe proteins in a simple and easy-understood way. In order to derive a prediction model which has a high prediction performance, PSL-PR-CPR employs MG-PSO-DS, an evolutionary computation algorithm, for doing feature selection to select appropriate feature sets that are suitable for C4.5 classifier to classify and predict PSL. MG-PSO-DS is also applied to optimize C4.5 prediction performance by tuning C4.5 parameters. The PSL-PR-CPR displays C4.5 decision rules and provides protein features that assist protein analysis after constructing the prediction model. In addition, PSL-PR-CPR shows the characteristics of important features within amino acid sequence according to the easy-understood property of AAwindow for the purpose of providing more information for analysis reference. For prediction performance validation, two datasets were applied to compare the prediction performance of PSL-PR-CPR, Mycobacterial PSL predictor, Gpos-PLoc, CELLO and LocateP at the end of this thesis. The two datasets are 852 mycobacterial proteins from the study of Mycobacterial PSL predictor and 452 Gram-positive bacterial proteins from the study of Gpos-PLoc. The 5 fold cross validation and the 10 fold cross validation are used to validate PSL-PR-CPR performance on 852 mycobacterial proteins and 452 Gram-positive bacterial proteins, respectively. PSL-PR-CPR also provides samples of C4.5 decision rules, important features and characteristics within amino acid sequence.
第一章 導論 3
1.1 研究背景與動機 3
1.2 論文架構 7
第二章 相關研究 8
2.1 分歧桿菌蛋白質細胞內定位預測研究 8
2.2 革蘭式陽性菌蛋白質細胞內定位預測研究 9
2.3 以數個分類器投票預測蛋白質在細胞內位置之研究 9
2.4 整合多個預測系統預測蛋白質在細胞內位置之研究 10
第三章 系統架構 11
3.1 系統PSL-PR-CPR簡介 11
3.2 蛋白質描述方法 15
3.2.1 常見的蛋白質描述方法 15
3.2.2 胺基酸索引簡介與用於預測蛋白質細胞定位之相關研究 18
3.2.3 AAwindow 21
3.3 演化式運算MG-PSO-DS 24
3.4 特徵抽取方法 28
3.5 分類器C4.5 31
3.6 將特徵進行排名 33
第四章 實驗與分析 38
4.1 測試資料集合 38
4.2 預測表現評斷標準 40
4.3 預測表現與系統比較 42
4.4 結果分析 49
4.4.1 特徵重要性排名參考 49
4.4.2 C4.5分類法則範例與說明 55
4.4.3 特徵作用於蛋白質區域之範例 57
第五章 結論 63
實驗環境 64
參考論文 65
附件A 一
附件B - 1/2 三
附件B - 2/2 一二
附件C - 1/2 一九
附件C - 2/2 四四
[1] R. D. Phair and T. Misteli, “High mobility of proteins in the mammalian cell nucleus”, Nature, vol. 404, p.p. 604–609, 2000.
[2] Emily C. Y. Su, H. S. Chiu, A. Lo, J. K. Hwang, T. Y. Sung and W. L. Hsu, “Protein subcellular localization prediction based on compartment-specific features and structure conservation.” BMC Bioinformatics, vol. 8, pp. 330, 2007.
[3] R. Yuste, "Fluorescence microscopy today". Nat Methods, vol. 2, pp. 902–904, 2005.
[4] J. L. Gardy and F. S. L. Brinkman, “Methods for predicting bacterial protein subcellular localization.” Nature Reviews Microbiology, vol. 4, pp. 741-751, 2006.
[5] A. Pierleoni, P. L. Martelli, P. Fariselli and R. Casadio, “BaCelLo: a balanced subcellular localization predictor.” Bioinformatics, vol. 22, pp. e408-16, 2006.
[6] C. S. Yu, C. J. Lin and J. K. Hwang, “Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions.” Protein Sci., vol. 13, pp. 1402-1406, 2004.
[7] C. S. Yu, Y. C. Chen, C. H. Lu and J. K. Hwang, “Prediction of protein subcellular localization.” Proteins, vol. 64, pp. 643-51, 2006.
[8] R. Nair and B. Rost, “Mimicking cellular sorting improves prediction of Subcellular Localization.” J. Mol. Biol., vol. 348, pp. 85-100, 2005.
[9] A. Hoglund, P. Donnes, T. Blum, H.-W. Adolph and O. Kohlbacher, “MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs, and amino acid composition.” Bioinformatics, vol. 22, pp. 1158-65, 2006.
[10] K. Nakai and M. Kanehisa. ”Expert system for predicting protein localization sites in gram-negative bacteria.” Proteins, vol. 11, pp. 95-110, 1991.
[11] J. L. Gardy, M. Laird, F. Chen, S. Rey, C. J. Walsh, G. E. Tusnady, M. Ester and F.S.L. Brinkman, “PSORT-B v.2.0: Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis.” Bioinformatics, vol. 21, pp. 617-623, 2005.
[12] R. Nair, P. Carter and B. Rost, “NLSdb: database of nuclear localization signals.” Nucleic Acids Res., vol. 31, pp. 397-9, 2003.
[13] Z. Lu, D. Szafron, R. Greiner, P. Lu, D. S. Wishart, B. Poulin, J. Anvik, C. Macdonell and R. Eisner, “Predicting subcellular localization of proteins using machine-learned classifiers.” Bioinformatics, vol. 20, pp. 547-56, 2004.
[14] J. D. Bendtsen, L. J. Jensen, N. Blom, G. V. Heijne and S. Brunak, “Feature-based prediction of non-classical and leaderless protein secretion.” Protein Eng Des Sel., vol. 17, pp. 349-56, 2004.
[15] H. Shatkay, A. Hoglund, S. Brady, T. Blum, P. Donnes and O. Kohlbacher. “SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data.” Bioinformatics, vol. 23, pp. 1410-1417, 2007.
[16] O. Emanuelsson, H. Nielsen, S. Brunak and G. V. Heijne, “Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.” J. Mol. Biol., vol. 300, pp. 1005-16, 2000.
[17] P. Horton, K.-J. Park, T. Obayashi and K. Nakai, “Protein Subcellular Localization Prediction with WoLF PSORT.” Proceedings of Asian Pacific Bioinformatics Conference 2006, Taipei, Taiwan, 2006.
[18] H. B. Shen and K. C. Chou. “PseAAC: a flexible web-server for generating various kinds of protein pseudo amino acid composition.” Analytical Biochemistry, vol. 373, pp. 386–388, 2008.
[19] T. Habib, C. Zhang, J. Y. Yang, M. Q. Yang and Y. Deng. “Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition.” BMC Genomics, vol. 9, pp. S16, 2008.
[20] K.C. Chou, “Prediction of protein cellular attributes using pseudo amino- acid-composition.” PROTEINS: Structure, Function, and Genetics, vol. 43, pp. 246–255, 2001.
[21] H. B. Shen and K. C. Chou, “Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition.” Biochem. Biophys. Res. Comm., vol. 337, pp. 752–756, 2005.
[22] W.-L. Huang, C.-W. Tung, H.-L. Huang, S.-F. Hwang and S.-Y. Ho. “ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features.” BioSystems, vol. 90, pp. 571-581, 2007.
[23] H.-B. Shen and K.-C. Chou. “Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins.” Protein Engineering, Design & Selection, vol. 20, pp. 39–46, 2007.
[24] M. Rashid, S. Saha and G. P.S. Raghava. “Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs.” BMC Bioinformatics, vol. 8, pp. 337, 2007.
[25] A. Garg and G. P. S. Raghava. “ESLpred2 improved method for predicting subcellular localization of eukaryotic proteins.” BMC Bioinformatics, vol. 9, pp. 503, 2008.
[26] K. C. Chou and H. B. Shen. “MemType-2L: A Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM.” Biochem Biophys Res. Comm., vol. 360, pp. 339–345, 2007.
[27] E. Tantoso and K.-B. Li. “AAIndexLoc: predicting subcellular localization of proteins based on a new representation of sequences using amino acid indices.” Amino Acids, vol. 35, pp. 345–353, 2008.
[28] M. Zhou, J. Boekhorst, C. Francke and R. J. Siezen. “LocateP: Genome-scale subcellular-location predictor for bacterial proteins.” BMC Bioinformatics, vol. 9, pp. 173, 2008.
[29] C.-C. Lee. “A Multi-Group Particle Swarm Optimization Algorithm Using a Dodge Strategy.” Master's thesis, National Taiwan Ocean University, unpublished, Keelung, Taiwan, 2008.
[30] S. Kawashima, P. Pokarowski, M. Pokarowska, A. Kolinski, T. Katayama and M. Kanehisa, “AAindex: amino acid index database, progress report 2008.” Nucleic Acids Res., vol. 36, pp. D202-D205, 2008.
[31] J. R. Quinlan. “C4.5: Programs for Machine Learning.” Morgan Kaufmann Publishers, 1993.
[32] J. R. Quinlan. “Improved use of continuous attributes in c4.5.” Journal of Artificial Intelligence Research, vol. 4, pp. 77-90, 1996.
[33] J. R. Quinlan. “Induction of Decision Trees.” Machine Learning, vol. 1, pp. 81-106, 1986.
[34] D.-W. Lee, C.-B. Ban, K.-B. Sim, H.-S. Seok, K.-J. Lee and B.-T. Zhang, “Behavior evolution of autonomous mobile robot using genetic programming based on evolvable hardware.” Systems, Man, and Cybernetics, 2000 IEEE International Conference on, vol. 5, pp. 3835 – 3840, 2000.
[35] S.-Y. Ho, S.-J. Ho, Y.-K. Lin and C.-C. Chu, "An Orthogonal Simulated Annealing Algorithm for Large Floorplanning Problems," IEEE Trans. VLSI systems, vol. 12, no. 8, pp. 874-876, Aug. 2004.
[36] S.-Y. Ho, H.-S. Lin, W.-H. Liauh and S.-J. Ho. “OPSO: Orthogonal Particle Swarm Optimization and Its Application to Task Assignment Problems.” IEEE Trans. Systems, Man, and Cybernetics -Part A, Systems and Humans, vol. 38, pp. 288-298, 2008.
[37] J. Axelsson, S. Menth, K. Semmler, “Genetic algorithms in industrial design.” Tools with Artificial Intelligence, 1993. TAI '93. Proceedings., Fifth International Conference on, pp.64 – 67, 1993.
[38] J. H. Holland. “Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence.” University of Michigan, 1975.
[39] J. Kennedy and R. Eberhart. “Particle swarm optimization.” in Proc. of the IEEE Int. Conf. on Neural Networks, Piscataway, NJ, pp. 1942–1948, 1995.
[40] L. M. Schmitt. “Theory of Genetic Algorithms.” Theoretical Computer Science vol. 259, pp. 1-61, 2001.
[41] M. Dorigo. “Optimization, Learning and Natural Algorithms.” PhD thesis, Politecnico di Milano, Italy, 1992.
[42] S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi. “Optimization by Simulated Annealing.” Science, vol. 220, pp. 671-680, 1983.
[43] M. Dash and H. Liu. “Feature Selection for Classification.” Intelligent Data Analysis, vol. 1, pp.131–156, 1997.
[44] H. Peng, F. Long and C. Ding. “Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 1226, 2005.
[45] K. Kira and L.A. Rendell. “The feature selection problem: Traditional methods and a new algorithm.” In: Proceedings of Ninth National Conference on Artificial Intelligence, 129–134, 1992.
[46] I. Kononenko. “Estimating attributes: Analysis and extension of RELIEF.” In: Proceedings of European Conference on Machine Learning, 171–182, 1994.
[47] C.-W. Tung and S.-Y. Ho. “POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties.” Bioinformatics, vol. 23, pp. 942–949, 2007.
[48] M. Kumar1, M. M. Gromiha and G. P. S. Raghava. “Identification of DNA-binding proteins using support vector machines and evolutionary profiles.” BMC Bioinformatics, vol., 8, pp. 463, 2007.
[49] C. Cortes and V. Vapnik. “Support vector networks.” Machine Learning, vol. 20, pp. 273–297, 1995.
[50] J. Tian, N. Wu, J. Guo and Y. Fan. “Prediction of amyloid fibril-forming segments based on a support vector machine.” BMC Bioinformatics, vol. 10, pp. S45, 2009.
[51] B. W. Matthews. “Comparison of the predicted and observed secondary structure of T4 phage lysozyme.” Biochim. Biophys. Acta., vol. 405, pp. 442-451, 1975.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top