跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.138) 您好!臺灣時間:2025/12/04 20:47
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳昱超
研究生(外文):Yu-Chao Chen
論文名稱:利用循序前進選擇策略於微陣列關鍵基因選取問題
論文名稱(外文):Key Gene Selection in Microarray Using Sequential Forward Selection Strategy
指導教授:林泓毅林泓毅引用關係
學位類別:碩士
校院名稱:國立臺中科技大學
系所名稱:流通管理系碩士班
學門:商業及管理學門
學類:行銷與流通學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:中文
論文頁數:55
中文關鍵詞:群集分析分類問題循序前進選擇演算法分類準確率識別力屬性評估
外文關鍵詞:cluster analysisclassification problemsSFSclassification accuracydiscrimination powerattribution evaluation
相關次數:
  • 被引用被引用:0
  • 點閱點閱:188
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
基因微陣列的表現資料,屬性具高維度(high-dimensional attributes),且為廣域資料(wide data),並僅有少部分的基因與資料分類的基因有關,因此需要更有效率且有效地選出關鍵基因的方法,不過結合一群具備良好識別力的基因屬性,不一定可達到最佳的分類結果,這是因為多個基因屬性可能重複執行類似的分類工作,為確保參與分類工作的特徵屬性(characterizing attributes)能夠各司其職並達到最大的分類功效,本研究以循序前進選擇法(sequential forward selection, SFS)當作屬性選擇的策略, SFS方法為最能大幅降低特徵變數選取所需花費的時間。另外本研究以模糊群集分析(fuzzy cluster analysis)對基因微陣列資料做群集處理,這樣可使資料簡化,並且可以快速突顯出具有識別力(discrimination power)之屬性。我們利用以熵值(entropy)為基礎的屬性評估準則,運用傳統”個別單一屬性評估”和”屬性集合之評估”方法,對所有原始屬性逕行辨別能力評估(discrimination capability evaluation),希冀藉由此方法的實施,更精準且快速地評估屬性的識別力。最後,我們採用六個微陣列資料集來驗證本研究提出的設計與方法,將挑選的屬性用於建構六種常見的分類器,藉以觀察群集處理、評估方法以及循序前進選擇法對分類準確度(accuracy)及區別能力(ROC area)的影響。

High dimension of feature space、low instance amount、and only a limited number of key genes critical for bioinformation classification problems are three characteristics in the analysis of microarray. On one hand, the selection of discriminative genes is important. On the other hand, a collection of discriminative genes do not necessarily lead to good classification quality. This is because some attributes could likely possess the similar classification effects and in turn lead to the redundant classification results. In order to generate the subsets of genes with not only sufficient but also necessary discrimination power for bioinformation classification problems, a novel selection strategy which integrates fuzzy cluster analyses and information gain (IG) into the traditional sequential forward selection (SFS) algorithm is proposed in this paper. In terms of classification accuracy and discrimination power, the experimental results gained from six microarray datasets show that our strategy can efficiently select compact subsets of characterizing genes and these selected genes are suitable for various conventional classifiers.

摘要 i
ABSTRACT ii
誌 謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 3
第二章 文獻回顧 5
2.1 群集分析與屬性評估 5
2.2 屬性選擇 10
2.3微陣列資料與分析 11
第三章 研究方法 15
3.1研究架構 15
3.2前置處理 15
3.3屬性評估 16
3.4屬性選擇 18
第四章 案例說明 22
4.1資料說明及資料前置處理 22
4.2屬性評估分析 24
4.3循序前進屬性選擇法 26
4.4分類效能 28
第五章 實驗結果 30
5.1實驗資料 30
5.2實驗結果與綜合分析 31
5.3分類準確度及分類器區別能力 34
第六章 結論及建議 44
6.1結論 44
6.2 未來研究建議 44
參考文獻 46
附錄 50


參考文獻
中文資料
鄭凱峰(民 92)。小樣本高維度資料中二階段分類法之效能評估-以基因微陣列資料癌症分類為例(碩士論文)。國立成功大學,台南市。
林恩汝(民95)。應用二階段分類法提升Least square-support vector machine(LS-SVM)技術之分類正確率(碩士論文)。國立勤益科技大學,台中市。
程中慧(民94)。無歸納偏置影響因素的基因選取方法之研究(碩士論文)。國立成功大學,台南市。
林建安(民99)。應用階層式分類法於腦部MR影像組織分割之研究(碩士論文)。國立臺灣海洋大學,基隆市。
吳國海(民91)。中醫藥基因體研究與微陣列及基因表現分析技術。國科會基因體醫學國家型科技計畫微陣列及基因體表現分析核心設備實驗室。.
陳順宇 (民89),多變量分析,華泰書局二版。
陳健尉 (民89),基因微陣列顯色分析法之簡介及其應用: 二十一世紀基因分析的利器,生物醫學報導,第二期。
蔡政安(民97)。微陣列資料分析。中國醫藥大學生物統計中心生統e報第十八期。
陳健尉:基因微陣列分析系統,國立台灣大學醫院九十一年度暑期「醫藥基因生物技術研究班」 2002; 7:62.

英文資料
Abeel, T., Helleputte, T., Peer Y. V., Dupont, P., & Saeys Y. (2010). Robust biomarker identification
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. Proc. ACM SIGMOD international conference on Management of Data, 94-105.
D. Jiang, C. Tang, A. Zhang. (2004). Cluster Analysis for Gene Expression Data: A Survey. Knowledge and Data Engineering, (16)11, 1370 - 1386.
Dudoit, S., Fridlyand, J., and Speed, T. (2002). Comparison of discrimination methods for the classification of tumor using gene expression data. Journal of the American Statistical Association, 97, 77-87.
Dudoit, S., Yang, YH., Callow, MJ., & Speed, TP. (2002). Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments. Statistica Sinica, 12(1), 111–139.
Dun J.-F. and Huang C.-L., (2007). A distributed PSO–SVM hybrid system with feature selection and parameter optimization. Applied Soft Computing, 8(4), 22, 1381–1391.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531-537.
Guyon, I., Elisseeff A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182.
Guyon, I., Weston, J., Barnhill., S. (2002) Gene selection for cancer classification using support vector machines, Machine learning. 46 389-422.
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3), 389-422.
Huang, C. L. and Wang, C. J.(2006). A Ga-based feature selection and parameters optimization for support vector machines. Expert Systems with Applications, 31, 231-240.
Kehoe, P. G., Russ, C., McIlroy, S., et al (1999) Variation in DCP1, encoding ACE, is associated with susceptibility to Alzheimer disease. Nature Genetics, 21, 71–72.
Li, L., Weinberg, CR., Darden, TA., & Pedersen, LG. (2001). Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics, 17(12), 1131-1142.
Liu, R.-S., Lina, T.-C., Chenc, C.-Y., Chaoa, Y.-T, & Chena, S.-Y. (2006). Pattern classification in DNA microarray data of multiple tumor types. Pattern Recognition. 39, 2426-2438.
Lockhart, B. E., Menke, J., Dahal, G. & Olszewski, N. E. (2000). Characterization and genomic analysis of tobacco vein clearing virus, a plant pararetrovirus that is transmitted vertically and related to sequences integrated in the host genome. J Gen Virol, 81(6), 1579–1585.
Madeira, S.C., and Oliveria, A.L., (2004). Biclustering Algorithms for Biological Data Analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, l(1), 24-45.
Marill, T., and Green , D. (1963). On the effectiveness of receptors in recognition systems. IEEE Trans Information Theory, 9(1),11-17.
Naghieh E, and Peng Y, Microarray Gene Expression Data Mining: Clustering Analysis Review. Department of Computing
Peng, H., Long, F., Ding, C. (2005). Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226-1238.
Peng, S. and Xu, Q. And Ling, X. B. and Peng, X. And Du, W. and Chen L. (2003). Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. Federation of European Biochemical Societies, 555, 358-362.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81-106.
Quinlan, J.R. (1979). Discovering rules by induction from large collections of examples. In D. Michie (Ed.), Expert systems in the micro electronic age, 168-201. Edinburgh: Edinburgh University Press.
Quinlan, J.R. (1993). C4.5: Programs for machine learning. California CA: Morgan Kaufmann.
Richmond, BG., & Strait, DS. (2000). Evidence that humans evolved from a knuckle-walking ancestor. Nature 404, 382–385.
Whitney, W. (1971). A direct method of nonparametric measurement selection. IEEE Trans Computers, 20(9), 1100–1103.
Zhang, X. D., Ferrer, M., Espeseth, A. S., Marine, S. D., Stec, E. M., Crackower, M. A., Strulovici, B. (2007). The use of strictly standardized mean difference for hit selection in primary RNA interference high-throughput screening experiments. Journal of biomolecular screening, 12(4), 497-509.
Zhang, X.D., Lacson, R., Yang, R., Marine, S.D., McCamphell, A., Toolan, D.M., Ferrer, M. (2010). The Use of SSMD-Based False Discovery and False Nondiscovery Rates in Genome-Scale RNAi Screens. Journal of biomolecular screening, 15(9), 1123-113.
Zhao, L., Zaki, M.J., (2005) . Tricluster: An Effective algorithm for Mining coherent clusters in 3D Microarray Data. Data Mining for Biomedical Applications, 3916,48-59.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top