跳到主要內容

臺灣博碩士論文加值系統

(44.210.99.209) 您好!臺灣時間:2024/04/14 15:10
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:吳豪謙
研究生(外文):Howard James Wu
論文名稱:運用徑向基函數類神經網路在癌症基因選擇之研究
論文名稱(外文):A Study on Gene Selection for Cancer Classification Using Radial Basis Function Network
指導教授:高成炎高成炎引用關係
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:英文
論文頁數:73
中文關鍵詞:基因微陣列癌症分類特徵選擇徑向基函數類神經網路特徵遞迴刪除法
外文關鍵詞:DNA microarrayCancer classificationRadial basis function networkFeature selectionRecursive feature elimination
相關次數:
  • 被引用被引用:0
  • 點閱點閱:215
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
人類專家們期望可以使用微陣列資料去判斷一個病人是否有癌症,或是能從中找出跟癌症有關聯的基因。然而微陣列資料有非常多的特徵(基因),比方說,人類有兩萬多個已知基因。這不但對人類專家們來說,很難去從中去找出潛在決定癌症的法則,而且對機器學習工具也是一個很大的考驗。因此我們需要有一個方法來根據這些基因的重要性做排名,以此來篩選出有決定性的基因。這樣一來,不但人類專家們可以花費較少心力去專研挑選出來的基因,並且也能夠加強機器學習工具在癌症分類的準確率。
這篇論文中,我們探討特徵選擇法對使用基因微陣列資料在癌症分類上的影響,主要是針對徑向基函數類神經網路的研究。我們的實驗顯示,無需參數調整的徑向基函數類神經網路能與參數調整最佳化的支援向量機在癌症分類上有相近的準確度,而且遠快於需調整最佳化的支援向量機。若使用特徴選擇法,則徑向基函數類神經網路相對於支援向量機有較多的準確度增進。
在特徴選擇法的研究中,我們也發現基因雜訊對徑向基函數類神經網路比支援向量機有較大的影響,因此我們提出一個新的特徵選擇法,快速徑向基函數類神經網路特徵遞迴刪除法。我們的實驗顯示,快速徑向基函數類神經網路特徵遞迴刪除法,對於增進癌症分類的準確率跟支援向量機特徵遞迴刪法有相近的效果。我們在生物相關文獻中也發現由快速徑向基函數類神經網路特徵遞迴刪除法選出的基因,例如基因Bcl-xl 在淋巴瘤,基因CXCL10 在前列腺癌,確實與癌症有關係,而這些基因是統計特徵選取法和支援向量機特徵遞迴刪除法很難選出來的。我們也在文中探討為何不同的特徵選擇法會選擇不同的基因。我們希望經由本研究,可以在癌症研究上提供另一種可能性。
Human experts hope to use microarray data to know if a patient has a caner and to identify genes associated with cancer. However, a microarray data has many features (genes), for example, human has more than twenty thousand genes. It is not only a difficult task for human to discover pattern in the microarray data but also a problem for machine learning methods. Therefore, we need to rank the importance of these genes in microarray data in order to select informative genes. And it could not only help human experts to research what genes lead to cancer but also help machine learning methods to increase the accuracy in cancer classification.
In this thesis, we studied the impact of feature selection methods on cancer classifier with DNA microarray data sets, especially on radial basis function network (RBF network). The experiment showed that RBF network could achieve similar accuracy with optimized support vector machine (SVM) in much less computing time. By using feature selection methods, RBF network could has more improvement than SVM in cancer classification accuracy.
During the research of feature selection, we observed that noisy genes could affect RBF network more than SVM. We, therefore, proposed a feature selection method, QuickRBF-RFE. QuickRBF could rank the importance of genes by itself and we could select a subset of discriminate genes by recursive feature elimination algorithm. Our experiment result showed that QuickRBF-RFE had similar performance with SVM-RFE in cancer classification. Moreover some of the top genes identified by QuickRBF-RFE, such as Bcl-xl in lymphoma cancer, CXCL10 in prostate cancer, were clarified to be associated with cancer in biological literature, which were difficult to be identified by statistical feature selection methods and SVM-RFE. Moreover we discussed why various feature selection methods would select different genes for cancer classification. We hope our research could open a new direction in cancer research.
Acknowledgments ii
中文摘要 iii
Abstract iv
List of Figures viii
List of Tables ix
1 Introduction 1
1.1 Motivation 1
1.2 DNA Microarray Experiment 2
1.3 Cancer Classification via DNA Microarray Data 4
1.4 Machine Learning Methods for Cancer Classification 5
1.5 Statistical Feature Selection Methods for Cancer Classification 6
1.6 Wrapper Approach for Cancer Classification 11
1.7 Proposed Method 14
1.8 Thesis Structure 15
2 QuickRBF Recursive Feature Elimination 17
2.1 Conventional Radial Basis Function Network 17
2.2 QuickRBF 21
2.3 Introduction to Recursive Feature Elimination 24
2.4 Algorithm of QuickRBF-RFE 24
2.5 Ranking Criterion of QuickRBF-RFE 26
3 RBF network versus SVM in cancer classification 28
3.1 RBF Network versus SVM with Full Feature Set 29
3.1.1 Materials 29
3.1.2 Full Feature Set in Cancer Classification 33
3.2 RBF Network versus SVM with Feature Selection 34
3.2.1 Choosing Feature Selection Methods 34
3.2.2 Feature Selection in Cancer Classification 39
3.3 Feature Selection Using QuickRBF-RFE 44
3.3.1 Materials 45
3.3.2 Accuracy Improvement for Selected Features 45
4 Comparison of Various Feature Selection Methods 48
4.1 Literature-based Evidences for Selected Features 50
4.2 Expression Profiles for Selected Features 57
5 Conclusion and Future work 66
5.1 Conclusion 66
5.2 Future Work 67
Bibliography 69
[1] J. Y. Song, J. K. Lee, N. W. Lee, H. H. Jung, S. H. Kim, and K. W. Lee. Microarray analysis of normal cervix, carcinoma in situ, and invasive cervical cancer: identification of candidate genes in pathogenesis of invasion in cervical cancer. Int J Gynecol Cancer, 2008. PMID: 18217980.
[2] F. Ezgu, A. Hasanoglu, I. Okur, G. Biberoglu, L. Tumer, T. Eminoglu, and H. Dogan. Rapid screening of 10 common mutations in turkish gaucher patients using electronic dna microarray. Blood Cells, Molecules, and Diseases, November 2007. PMID: 18035560.
[3] K. R. Calvo, L. A. Liotta, and E. F. Petricoin. Clinical proteomics: from biomarker discovery and cell signaling profiles to individualized personal therapy. Bioscience reports, 25:107–25. PMID: 16222423.
[4] A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics (Oxford, England), 21:631–643, 2005.
[5] C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, USA, 1995.
[6] M. Pirooznia, J. Yang, M. Q. Yang, and Y. Deng. A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics, 9:S13, 2008.
[7] C. H. Li. Cancer classification with evolutional radial basis function network. Master’s thesis, National Taiwan University, 2005.
[8] I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, M. Bittner, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, O. P. Kallioniemi, B. Wilfond, A. Borg, J. Trent, M. Raffeld, Z. Yakhini, A. Ben-Dor, E. Dougherty, J. Kononen, L. Bubendorf, W. Fehrle, S. Pittaluga, S. Gruvberger, N. Loman, O. Johannsson, H. Olsson, and G. Sauter. Gene-expression profiles in hereditary breast cancer. The New England journal of medicine, 344:539–48, February 2001. PMID: 11207349.
[9] T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics (Oxford, England), 16:906–14, October 2000. PMID: 11120680.
[10] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science (New York, N.Y.), 286:531–7, October 1999. PMID: 10521349.
[11] A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini. Tissue classification with gene expression profiles. Journal of computational biology : a journal of computational molecular cell biology, 7:559–83, 2000. PMID: 11108479.
[12] P. J. Park, M. Pagano, and M. Bonetti. A nonparametric scoring algorithm for identifying informative genes from microarray data. Pacific Symposium on Biocomputing., pages 52–63, 2001. PMID: 11262969.
[13] H. Y. Chuang, H. K. Tsai, Y. F. Tsai, and C. Y. Kao. Ranking genes for discriminability on microarray data. Journal of Information Science and Engineering, 19:953–966, 2003.
[14] H. Y. Chuang, H. Liu, S. Brown, C. McMunn-Coffran, C. Y. Kao, and D. F. Hsu. Identifying significant genes from microarray data. Bioinformatics and Bioengineering, 2004. BIBE 2004. Proceedings. Fourth IEEE Symposium on, pages 358–365, 2004.
[15] A. Hofmann, T. Horeis, and B. Sick. Feature selection for intrusion detection: an evolutionary wrapper approach. In Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on, volume 2, pages 1563–1568 vol.2, 2004.
[16] B. Duan and Y. H. Pao. Iterative feature weighting for identification of relevant features with radial basis function networks. In Neural Networks, 2005. IJCNN’05. Proceedings. 2005 IEEE International Joint Conference on, volume 2, pages 1063–1068 vol. 2, 2005.
[17] C. T. Su, C. Y. Chen, and Y. Y. Ou. Protein disorder prediction by condensed pssm considering propensity for order or disorder. BMC bioinformatics, 7:319, 2006. PMID: 16796745.
[18] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46:389–422, 2002.
[19] R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 97:273–324, 1997.
[20] S. Fine and K. Scheinberg. Efficient svm training using low-rank kernel representations. Journal of Machine Learning Research, 2:243–264, 2002.
[21] J. P. Zhang, Z. W. Li, and J. Yang. A parallel svm training algorithm on large-scale classification problems. Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on, 3, 2005.
[22] Quickrbf: an efficient rbfn package. http://www.csie.ntu.edu.tw/˜ yien/quickrbf/.
[23] R. Penrose. A generalized inverse for matrices. Proc. Cambridge Philos. Soc, 51:406–413, 1955.
[24] C. T. Su, C. Y. Chen, and C. M. Hsu. ipda: integrated protein disorder analyzer. Nucleic Acids Research, 35, July 2007.
[25] Y. S. Hwang and S. Y. Bang. An efficient method to construct a radial basis function neural network classifier. Neural Networks, 10:1495–1503, 1997.
[26] A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. Hudson, L. Lu, D. B. Lewis, R. Tibshirani, G. Sherlock,W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, J. C. Byrd, D. Botstein, P. O. Brown, and L. M. Staudt. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403:503–511, February 2000.
[27] M. A. Shipp, K. N. Ross, P. Tamayo, A. P. Weng, J. L. Kutok, R. C.T. Aguiar, M. Gaasenbeek, M. Angelo, M. Reich, G. S. Pinkus, T. S. Ray, M. A. Koval, K. W. Last, A. Norton, T. A. Lister, J. Mesirov, D. S. Neuberg, E. S. Lander, J. C. Aster, and T. R. Golub. Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med, 8:68–74, 2002.
[28] D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. D’Amico, J. P. Richie, E. S. Lander, M. Loda, P. W. Kantoff, T. R. Golub, and W. R. Sellers. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1:203–209, March 2002.
[29] U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America, 96:6745–50, 1999.
[30] Kent ridge bio-medical data set repository. http://sdmc.lit.org.sg/gedatasets/datasets.
[31] C. C. Chang and C. J. Lin. LIBSVM: a library for support vector machines. 2001. Software available at http://www.csie.ntu.edu.tw/˜ cjlin/libsvm.
[32] J. Lu, S. Hardy, W. L. Tao, S. Muse, B. Weir, and S. Spruill. Classical statistical approaches to molecular classification of cancer from gene expression profiling. Methods of Microarray Data Analysis, 2002.
[33] C. Ambroise and G. J. McLachlan. Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences, 99:6562–6566, May 2002.
[34] W. Zhao, M. E. Daneshpouy, N. Mounier, J. Bri`ere, C. Leboeuf, L. Plassa, E. Turpin, J. M. Cayuela, J. C. Ameisen, C. Gisselbrecht, and A. Janin. Prognostic significance of bcl-xl gene expression and apoptotic cell counts in follicular lymphoma. Blood, 103:695–7, 2004. PMID: 12969962.
[35] J Ying, H Li, Y Cui, A. H. Y. Wong, C Langford, and Q Tao. Epigenetic disruption of two proapoptotic genes mapk10//jnk3 and ptpn13//fap-1 in multiple lymphomas and carcinomas through hypermethylation of a common bidirectional promoter. Leukemia, 20:1173–1175, March 2006.
[36] T. W. Behrens, J. Jagadeesh, P. Scherle, G. Kearns, J. Yewdell, and L. M. Staudt. Jaw1, a lymphoid-restricted membrane protein localized to the endoplasmic reticulum. Journal of immunology (Baltimore, Md. : 1950), 153:682–90, July 1994. PMID: 8021504.
[37] N. Ortonne, J. Dupuis, A. Plonquet, N. Martin, C. Copie-Bergman, M. Bagot, M. Delfau-Larue, A. Gaulier, C. Haioun, J. Wechsler, and P. Gaulard. Characterization of cxcl13+ neoplastic t cells in cutaneous lesions of ngioimmunoblastic t-cell lymphoma (aitl). The American journal of surgical pathology, 31:1068–76, July 2007. PMID: 17592274.
[38] C. Iavarone, C. Wolfgang, V. Kumar, P. Duray, M. Willingham, I. Pastan, and T. K. Bera. Page4 is a toplasmic protein that is expressed in normal prostate and in prostate cancers. Molecular cancer therapeutics, 1:329–35, March 2002. PMID: 12489849.
[39] M. L. Nagpal, J. Davis, and T. Lin. Overexpression of cxcl10 in human prostate lncap cells activates its receptor cxcr3) expression and inhibits cell proliferation. Biochimica et biophysica acta, 1762:811–8, September 2006. PMID: 16934957.
[40] S. Takahashi, S. Suzuki, S. Inaguma, Y. Ikeda, Y. M. Cho, N. Nishiyama, T. Fujita, T. Inoue, T. Hioki, Y. Sugimura, T. Ushijima, and T. Shirai. Down-regulation of human x-box binding protein 1 (hxbp-1) expression correlates with tumor progression in human prostate cancers. The Prostate, 50:154–61, February 2002. PMID: 11813207.
[41] Z. Chen, Z. Fan, J. E. McNeal, R. Nolley, M. C. Caldwell, M. Mahadevappa, Z. Zhang, J. A. Warrington, and T. A. Stamey. Hepsin and maspin are inversely expressed in laser capture microdissectioned prostate cancer. The Journal of urology, 169:1316–9, April 2003. PMID: 12629351.
[42] B. Johansson, M. R. Pourian, Y. C. Chuan, I. Byman, A. Bergh, S. T. Pang, G. Norstedt, T. Bergman, and A. Pousette. Proteomic comparison of prostate cancer cell lines lncap-fgc and lncap-r reveals heatshock protein 60 as a marker for prostate malignancy. The Prostate, 66:1235–44, September 2006. PMID: 16705742.
[43] S. Ohtsuki, M. Kamoi, Y.Watanabe, H. Suzuki, S. Hori, and T. Terasaki. Correlation of induction of atp binding cassette transporter a5 (abca5) and abcb1 mrnas with differentiation state of human colon tumor. Biological & pharmaceutical bulletin, 30:1144–6, June 2007. PMID: 17541169.
[44] D. G. Tang, Y. Q. Chen, P. J. Newman, L. Shi, X. Gao, C. A. Diglio, and K. V. Honn. Identification of pecam-1 in solid tumor cells and its potential involvement in tumor cell adhesion to endothelium. The Journal of biological chemistry, 268:22883–94, October 1993. PMID: 8226797.
[45] K. B. Bowen, A. P. Reimers, S. Luman, J. D. Kronz, W. E. Fyffe, and J. T. Oxford. Immunohistochemical localization of collagen type xi alpha1 and alpha2 chains in human colon tissue. The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society, 56:275–83, March 2008. PMID: 18040076.
[46] E. Forgy. Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics, 21:768, 1965.
[47] Y. Ding and D. Wilkins. Improving the performance of svm-rfe to select genes in microarray data. BMC Bioinformatics, 7 Suppl 2:S12, September 2006. PMID: 17118133.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊