跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.138) 您好!臺灣時間:2025/12/04 20:48
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張淑淨
研究生(外文):Shu-Jing Chang
論文名稱:非線性基因選取方法
論文名稱(外文):Nonlinear Gene Selection Method
指導教授:洪慧念洪慧念引用關係洪志真洪志真引用關係
指導教授(外文):Hui-Nien HungJyh-Jen Horng Shiau
學位類別:碩士
校院名稱:國立交通大學
系所名稱:統計學研究所
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:英文
論文頁數:50
中文關鍵詞:非線性基因選取
外文關鍵詞:NonlinearGene Selection
相關次數:
  • 被引用被引用:0
  • 點閱點閱:182
  • 評分評分:
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
微生物晶片資料通常包含的基因數非常多(數千個),但相對的腫瘤樣本數不到100個。從這些大量的基因中去挑選對於分類具有顯著關係的基因稱為基因選取(gene or feature selection)。我們在本文中回顧了一些基因選取的方法以及統計學家對於"大p小n "問題的處裡。我們著重的方法是Support Vector Machines (SVMs),將從模擬實驗去探討線性以及非線性分類問題。對於線性分類問題,我們主要探討基因之間相關性的影響和資料具有部份重疊(overlap)的情況;對於非線性分類問題,我們使用兩種基因選取方法,並比較其重要基因的選取結果及分類精確度。
Microarray data contains large number of p genes (usually several thousands) and small number of n patients (usually nearly 100 or less). The problem of identifying the features best discriminate among the classes to improve the ability of a classifier is known as feature selection. Some current feature selection methods and the problem of dealing with "large p, small n" are reviewed. The Support Vector Machines (SVMs) has proofed excellent performance in practice as a classification methodology. For linear classification problem, this paper studies the following two issues: (i) the number of one gene s surrogates somehow affects the importance of the gene; (ii) the case of overlapping classes. For nonlinear classification problem, we utilize two procedures: 1. mapping the original nonlinear separable data to the high dimension space, and then use SVM RFE with linear kernel to find crucial genes; 2. using SVM RFE with nonlinear kernel. Then we compare these two methods on nonlinear toy problem.
ABSTRACT (in Chinese) i
ABSTRACT (in English) ii
ACKNOWLEDGEMENTS (in Chinese) iii
CONTENTS iv
1 Introduction 1
2 Literature Review 3
3 Support Vector Machines 5
3.1 Linear Classifier for Linearly Separable Data 6
3.2 Linear Classifier for Overlapping Classes 8
3.3 Nonlinear Classifier for Nonlinearly Separable Data 9
3.4 Popular Kernel and Standard Type of Classification 10
4 Feature Selection Methods for Kernel Machines 11
4.1 Linear Case 11
4.2 Nonlinear Case 12
5 Simulation Studies 13
5.1 Linear problem 13
5.1.1 Overlapping Classes 13
5.1.2 Correlated Data 14
5.2 Nonlinear problem 15
5.2.1 Compare several different cases using nonlinear SVMs 15
5.2.2 Toy experiment 19
6 Conclusion and Future Research 20
Reference 21
[1] Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., & Yakhini, Z. (2000). Tissue classification with gene expression profiles. J. Computational Biology 7, 559-584.

[2] Boser, I. Guyon, and Vapnik, V. (1992). An training algorithm for optimal classifiers. Fifth Annual Workshop on Computational Learning Theory, Pittsburgh ACM, pp. 144-152.

[3] Cristianini, N., Campbel, C., and Shawe-Taylor, J. (1998). Dynamically adapting kernels in support vector machines. In Advances in Neural Information Processing Systems.

[4] Fujarewicz, K., Wiench, M. (2003). Selecting differentially expressed genes for colon tumor classification, Int. J. Appl. Math. Comput. Sci., Vol. 13, No. 3, 327-335.

[5] Furey, T., Cristianini, N., Duffy, N., Bednarski, D., Schummer, M., & Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16, 906-914.

[6] Golub, T., Slonim, D., tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloom- field, C., and Lander, E. (1999). Molecular classification of cancer; class discovery and prediction by gene expression monitoring. Science 286, 531-537. The data is available on-line at http://www- genome.wi,mit.edu/MPR/data_set_ALL_AML.html.

[7] Grandvalet, Y. and Canu, S. (2002). Adaptive scaling for feature selection in SVMs. In NIPS 15.

[8] GUNN, S. R. (1998). Support Vector Machines for Classification and Regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton.

[9] Guyon, I., Weston, J., Barnhill, S., and Vapnik V. (2002). Gene selection for cancer classification using support vector machines. Machine learning 46, 389-422.

[10] Khan, J., Wei, J., Ringner. M., Atonescu, C., Peterson, C. and Meltzer, P. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673-679.

[11] Krishnapuram, B., Carin, L., Hartemink, A. (2004). Gene Expression Analysis: Joint Feature Selection and Classifier Design. In Kernel Methods in Computational Biology, Schölkopf, B., Tsuda, K., & Vert, J.-P., eds. MIT Press.

[12] Lee, Y., Lin, Y., and Wahba, G. (2001). Multicategory Support Vector Machines. Proceedings of the 33rd Symposium on the Interface. Also available as TR 1043, Statistics Dept., University of Wisconsin-Madison.

[13] Lee, Y., and Lee, C. (2002). Classification of multiple cancer types by multicategory support vector machines using gene expression data. TR 1051r, Statistical Dept., University of Wisconsin-Madison. To appear in Bioinformatics.

[14] Markowetz, F., Edler, L., and Vingron, M. (2003). Support Vector Machines for Protein Fold Class Prediction. Biometrical Journal 45, 3, 377-389

[15] Pavlidis, P., Weston, J., Cai, J., & Grundy, W. N. (2000). Gene functional analysis from heterogeneous data. Submitted for publication.

[16] Rakotomamonjy, A. (2003). Variable Selection Using SVM-based Criteria. Journal of Machine Learning research, 3:1357-1370.

[17] Rifkin, R. M. (2002). Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning. Massachusetts Institute of Technology.

[18] Vapnik, V. (1998). Statistical Learning Theory. New York, Wiley.

[19] Weston, J., Muckerjee, S., Chapelle, O., Pontil, M., Poggio, T., and Vapnik, V. (2000). Feature selection for SVMs. Advances in Neural Information Processing Systems.

[20] Zhang, X. and Wong, W. H. (2001). Recursive Sample Classification and Gene Selection based on SVM: Method and Software Description, Biostatistics Dpt. Tech Report, Harvard School of Public Health.
電子全文 電子全文(限國圖所屬電腦使用)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文