跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.57) 您好!臺灣時間:2026/02/07 12:20
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:柯子惟
研究生(外文):Ko, Tzu Wei
論文名稱:以逐步SVM縮減p大n小資料型態之維度
論文名稱(外文):Dimension reduction of large p small n data set based on stepwise SVM
指導教授:周珮婷
學位類別:碩士
校院名稱:國立政治大學
系所名稱:統計學系
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
畢業學年度:105
語文別:中文
論文頁數:35
中文關鍵詞:維度縮減特徵選取p大n小資料型態逐步SVM
外文關鍵詞:Stepwise SVMDimension reductionFeature selectionLarge p small n data set
相關次數:
  • 被引用被引用:0
  • 點閱點閱:421
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究目的為p大n小資料型態的維度縮減,提出逐步SVM方法,並與未刪減任何變數之研究資料和主成份分析 (PCA)、皮爾森積差相關係數(PCCs)以及基於隨機森林的遞迴特徵消除(RF-RFE) 維度縮減法進行比較,並探討逐步SVM是否能篩選出較能區別樣本類別的特徵集合。研究資料為六筆疾病相關的基因表現以及生物光譜資料。
首先,本研究以監督式學習下使用逐步SVM做特徵選取,從篩選的結果來看,逐步SVM確實能有效從所有變數中萃取出對於樣本的分類上擁有較高重要性之特徵。接著將研究資料分為訓練和測試集,再以半監督式學習下使用逐步SVM、PCA、PCCs和RF-RFE縮減各研究資料之維度,最後配適SVM模型計算預測率,重複以上動作100次取平均當作各維度縮減法的最終預測正確率。觀察計算結果,本研究發現使用逐步SVM所得之預測正確率均優於未處理之原始資料,而與其他方法相比,逐步SVM的穩定度優於PCA和RF-RFE,和PCCs相比則較難看出差異。本研究認為對p大n小資料型態進行維度縮減是必要的,因其能有效消除資料中的雜訊以提升模型整體的預測準確率。
第壹章 研究動機及目的 1
第一節 高維度小樣本資料維度縮減現況 1
第二節 研究動機與目的 2
第貳章 文獻探討 3
第參章 研究方法及資料 7
第一節 逐步SVM 7
第二節 所使用之演算法 8
第三節 使用的維度縮減法 10
第四節 研究資料描述 13
第肆章 資料分析與結果 20
第一節 實驗過程與分析 20
第二節 結果與方法比較 28
第伍章 結論與建議 29
第一節 結論 29
第二節 研究限制與建議 32
參考文獻 33
Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), 6745-6750.
Bellman, R. E. (2015). Adaptive Control Processes: A Guided Tour: Princeton University Press.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on Computational learning theory, Pittsburgh, Pennsylvania, USA.
Boulesteix, A.-L. (2004). PLS Dimension Reduction for Classification with Microarray Data Statistical applications in genetics and molecular biology (Vol. 3, pp. 1).
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. doi:10.1023/a:1010933404324
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. doi:10.1007/bf00994018
Cunningham, P. (2008). Dimension Reduction. In M. Cord & P. Cunningham (Eds.), Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval (pp. 91-112). Berlin, Heidelberg: Springer Berlin Heidelberg.
Dai, J. J., Lieu, L., & Rocke, D. (2006). Dimension reduction for classification with gene expression microarray data. Statistical applications in genetics and molecular biology, 5(1), 1147.
Gordon, G. J., Jensen, R. V., Hsiao, L.-L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., . . . Bueno, R. (2002). Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma. Cancer Research, 62(17), 4963-4967.
Granitto, P. M., Furlanello, C., Biasioli, F., & Gasperi, F. (2006). Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 83(2), 83-90.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
Guyon, I., Gunn, S. R., Ben-Hur, A., & Dror, G. (2004). Result Analysis of the NIPS 2003 Feature Selection Challenge. Paper presented at the NIPS.
Hedenfalk , I., Duggan , D., Chen , Y., Radmacher , M., Bittner , M., Simon , R., . . . Trent , J. (2001). Gene-Expression Profiles in Hereditary Breast Cancer. New England Journal of Medicine, 344(8), 539-548. doi:10.1056/nejm200102223440801
Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., . . . Peterson, C. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature medicine, 7(6), 673-679.
Lee Rodgers, J., & Nicewander, W. A. (1988). Thirteen Ways to Look at the Correlation Coefficient. The American Statistician, 42(1), 59-66. doi:10.1080/00031305.1988.10475524
Pal, M., & Foody, G. M. (2010). Feature selection for classification of hyperspectral data by SVM. IEEE Transactions on Geoscience and Remote Sensing, 48(5), 2297-2307.
Pearson, K. (1901). LIII. On lines and planes of closest fit to systems of points in space. Philosophical Magazine Series 6, 2(11), 559-572. doi:10.1080/14786440109462720
Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), 6745-6750.
Bellman, R. E. (2015). Adaptive Control Processes: A Guided Tour: Princeton University Press.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on Computational learning theory, Pittsburgh, Pennsylvania, USA.
Boulesteix, A.-L. (2004). PLS Dimension Reduction for Classification with Microarray Data Statistical applications in genetics and molecular biology (Vol. 3, pp. 1).
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. doi:10.1023/a:1010933404324
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. doi:10.1007/bf00994018
Cunningham, P. (2008). Dimension Reduction. In M. Cord & P. Cunningham (Eds.), Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval (pp. 91-112). Berlin, Heidelberg: Springer Berlin Heidelberg.
Dai, J. J., Lieu, L., & Rocke, D. (2006). Dimension reduction for classification with gene expression microarray data. Statistical applications in genetics and molecular biology, 5(1), 1147.
Gordon, G. J., Jensen, R. V., Hsiao, L.-L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., . . . Bueno, R. (2002). Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma. Cancer Research, 62(17), 4963-4967.
Granitto, P. M., Furlanello, C., Biasioli, F., & Gasperi, F. (2006). Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 83(2), 83-90.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
Guyon, I., Gunn, S. R., Ben-Hur, A., & Dror, G. (2004). Result Analysis of the NIPS 2003 Feature Selection Challenge. Paper presented at the NIPS.
Hedenfalk , I., Duggan , D., Chen , Y., Radmacher , M., Bittner , M., Simon , R., . . . Trent , J. (2001). Gene-Expression Profiles in Hereditary Breast Cancer. New England Journal of Medicine, 344(8), 539-548. doi:10.1056/nejm200102223440801
Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., . . . Peterson, C. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature medicine, 7(6), 673-679.
Lee Rodgers, J., & Nicewander, W. A. (1988). Thirteen Ways to Look at the Correlation Coefficient. The American Statistician, 42(1), 59-66. doi:10.1080/00031305.1988.10475524
Pal, M., & Foody, G. M. (2010). Feature selection for classification of hyperspectral data by SVM. IEEE Transactions on Geoscience and Remote Sensing, 48(5), 2297-2307.
Pearson, K. (1901). LIII. On lines and planes of closest fit to systems of points in space. Philosophical Magazine Series 6, 2(11), 559-572. doi:10.1080/14786440109462720
Shipp, M. A., Ross, K. N., Tamayo, P., Weng, A. P., Kutok, J. L., Aguiar, R. C., . . . Pinkus, G. S. (2002). Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature medicine, 8(1), 68-74.
Tang, J., Alelyani, S., & Liu, H. (2014). Feature selection for classification: A review. Data Classification: Algorithms and Applications, 37.
Tin Kam, H. (1995, 14-16 Aug 1995). Random decision forests. Paper presented at the Proceedings of 3rd International Conference on Document Analysis and Recognition.
Xu, X., & Wang, X. (2005). An Adaptive Network Intrusion Detection Method Based on PCA and Support Vector Machines. In X. Li, S. Wang, & Z. Y. Dong (Eds.), Advanced Data Mining and Applications: First International Conference, ADMA 2005, Wuhan, China, July 22-24, 2005. Proceedings (pp. 696-703). Berlin, Heidelberg: Springer Berlin Heidelberg.
Yeung, K. Y., & Ruzzo, W. L. (2001). Principal component analysis for clustering gene expression data. Bioinformatics, 17(9), 763-774. doi:10.1093/bioinformatics/17.9.763
林宗勳,Support Vector Machine簡介
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊