跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.168) 您好!臺灣時間:2024/12/15 06:23
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:程中慧
研究生(外文):Chung-Hui Cheng
論文名稱:無歸納偏置影響因素的基因選取方法之研究
論文名稱(外文):A Gene Selection Method without Inductive Bias for Cancer Classification Using Microarray Data
指導教授:翁慈宗翁慈宗引用關係
指導教授(外文):Tzu-Tsung Wong
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:中文
論文頁數:73
中文關鍵詞:基因微陣列基因選取癌症分類二階段分類法機器學習
外文關鍵詞:two-stage classification methodmicroarray datagene selectiontumor classification
相關次數:
  • 被引用被引用:4
  • 點閱點閱:201
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
隨著二十一世紀時生物資訊的技術深受世人注目,為了提昇人類本身的飲食、健康、疾病問題等等的品質,讓生物學家更快了解人體的基因、找尋每個基因實質的功用,需要更精進的技術來趕上世界變化的腳步,在這樣的需求與努力下,基因微陣列技術已廣泛地應用於各領域的研究上。基因微陣列技術的特點是能同時分析成千上萬個基因資料,這樣的方法對學者而言,無疑是一大助益。本研究面對基因微陣列技術應用在癌症分類的問題,希望能建立出有能力分辨是否為癌症病人之基因群的方法,也因為癌症病人的樣本資料稀少且每筆基因微陣列資料都帶有龐大數量的基因,透過傳統統計方法的實驗也不見得會有極佳的結果,所以,本研究針對這樣的問題提出適當的解決方法,除了透過基因篩選的新方法去除其他與該癌症不相關基因,篩選出有決定性的最小基因群,並針對目前的集合基因排序法所隱含的歸納偏置影響因素的問題加以探討。本研究方法為一集合基因排序法,其概念是從所有基因中找到一組基因,這組基因可以讓癌症分類預測達到較理想的精確度,目前的集合基因排序法大都會透過分類器做為篩選機制,由於任何一種分類器都會具有歸納偏置的特性,歸納偏置的存在會因採用不同的分類器而對基因篩選的結果帶來不同的影響,顯示基因挑選結果缺乏客觀性,因此,本研究以去除歸納偏置存在的問題為方法建立宗旨,透過個別基因排序法為輔,最終在分類預測的結果表現上確實得到提升,又因本研究透過個別基因排序法協助而找出的最終基因集與個別基因排序法找出的最終基因集兩者的差異程度增大,顯示本研究方法確實可以找到那些被其他比較方法所遺漏掉且更具價值的基因,因此,提出以無歸納偏置的概念確實發覺出更重要且客觀的基因集,而且如此基礎下所建立的二階段分類法是更具信賴的分類學習機制。
Two-stage classification method is one of the major classification methods developed for microarray data in these years. The number of genes in a microarray instance is usually more than one thousand. The first stage of a two-stage classification method selects a pre-specified number of genes, usually between 50 and 200 genes, and passes these genes to the second stage for classification. The gene selection methods can be divided into two categories: individual gene ranking and subset gene ranking. A subset-gene-ranking method chooses a gene subset that generally has a higher discernibility on the class value. However, it contains some inductive bias induced by the classifier for determining the discernibility of a gene subset. To overcome such deficiency, we employ the statistical DM value to design a new subset-gene-ranking method that will not contain any inductive bias. Three individual-gene-ranking methods are used to find the initial gene subset for our method. The experimental results on six microarray data sets show that the classification accuracy of our method is insensitive to the individual-gene-ranking method for generating the initial gene subset. When a gene subset has a larger DM value, it generally results a higher classification accuracy. This suggests that the gene selection method based on the DM value has the potentiality for finding the genes that are important for tumor classification.
目錄
摘要 I
Abstract II
誌謝 III
目錄 IV
圖目錄 VII
表目錄 VIII
第一章 緒論 1
1.1 研究動機 1
1.2 研究目的 2
1.3 研究架構與步驟 3
第二章 文獻探討 5
2.1 基因微陣列 5
2.1.1 基因微陣列型態 6
2.1.2 基因微陣列資料的應用 7
2.2 基因選取法則 7
2.2.1 個別基因排序法 8
2.2.2 集合基因排序法 8
2.2.2.1 支向機遞迴式屬性消去法 9
2.3 歸納偏置 11
2.4 癌症分類法 12
2.5 交互認證法則 15
第三章 研究方法 16
3.1 名詞定義 16
3.2 第一階段基因選取法 18
3.2.1 個別基因排序法 18
3.2.1.1 t統計量 18
3.2.1.2 BW比率 19
3.2.1.3 無參數評分演算法 19
3.2.2 集合基因排序法 20
3.2.2.1 無歸納偏置影響之基因選取演算法 21
3.2.2.2 基因演算法結合K鄰近點分析 29
3.3 第二階段分類法 32
3.3.1 K鄰近點分析 32
3.3.2 支向機 32
3.4 評估流程 33
3.4.1 評估標準 33
第四章 實證研究 36
4.1 資料收集與整理 36
4.2 參數與DM門檻值之設定 37
4.3 DM差異計量基因選取演算法結合K鄰近點分析之實證結果 39
4.3.1 程式執行效率 39
4.3.2 基因相似度 41
4.3.3 不同基因選取法結合分類法之分類預測精確度 44
4.3.3.1 不同基因選取法結合K鄰近點分析 44
4.3.3.2 不同基因選取法結合支向機 47
4.3.4 相同基因選取法結合不同的分類法 50
4.4 小結 52
第五章結論與建議 54
5.1 結論 54
5.2 建議 55
參考文獻 57
中文
陳健尉 (2000),基因微陣列顯色分析法之簡介及其應用: 二十一世紀基因分析的利器,生物醫學報導,第二期。

吳明隆 (2003),SPSS統計應用學習實務:問卷分析與應用統計,知城數位科技。

張雅芳、黃正仲 (2004),微陣列生物科技,科學發展,第381期,34-41。

鄭凱峰 (2004),小樣本高維度資料中二階段分類法之效能評估-以基因微陣列資
料癌症分類為例,國立成功大學工業與資訊管理學系碩士班碩士論文。

許景涵 (2005),以基因微陣列資料探討基因選取方法對分類正確率之影響,國立成功大學工業與資訊管理學系碩士班碩士論文。
英文
Albrecht, A., Vinterbo, S. A., and Ohno-Machado, L. (2003). An epicurean learning approach to gene-expression data classification, Artificial Intelligence in Medicine, 28, 75-87.

Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine. A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proceedings of the National Academy Sciences of the United States of America, 96, 6745-6750.

Antoniadis, A., Lambert-Lacroix, S., and Leblanc, F. (2003). Effective dimension reduction methods for tumor classification using gene expression data, Bioinformatics, 19, 563-570.

Berrar, D. P., Dubitzky, W., and Granzow, M. (2003), Granzow practical approach to microarray data analysis, Kluwer Academic Publisher.

Breiman, L. (1996). Bagging predictors, Machine Learning, 24, 123-140.

Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2(2), 121-167.

DeRisi, J. L., Iyer, V. R., and Brown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 278, 680-686.

Dettling, M., and Bühlmann, P. (2003). Boosting for tumor classification with gene expression data, Bioinformatics, 19(9), 1061-1069.

Desper, R., Khan, J., and , A. A. (2004). Tumor classification using phylogenetic methods on expression data, Journal of Theoretical Biology, 228, 477-496.

Dudoit, S., Laan, M., Keles, S., and Cornec, M. (2003). Unified cross-validation methodology for estimator selection and application to genomic, Bulletin of the International Statistical Institute, 54th Session Proceedings, Vol. LX, Book 2, 412-415.

Friedman, N., Linial, M., Nachman, I., and Pe'er, D. (2000). Using Bayesian networks to analyze expression data, Journal of Computational Biology, 7, 601-620.

Furey, T., Cristianini, N., Duffy, N., Bednarski, D., Schummer, M., and Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, 16, 906-914.

Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531-537.

Gordon, J., Jensen, R. V., Hsiao, L. L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., Richards, W. G., Sugarbaker, D. J., and Bueno, R. (2002). Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma, Cancer Research, 62, 4963-4967.

Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002) Gene selection for cancer classification using support vector machines, Machine Learning, 46, 389-422.

Higgins, J. P., Montgomery, K., Wang, L., Domanay, E., Warnke, R. A., Brooks, J. D., and van de Rijn, M. (2003). Expression of FKBP12 in benign and malignant vascular endothelium: An immunohistochemical study on conventional sections and tissue microarrays, American Journal of Surgical Pathology, 27, 58-64

Jörnsten, R. and Yu, B. (2003). Simultaneous gene clustering and subset selection for sample classification via MDL, Bioinformatics, 19, 1100-1109.

Kantardzic, M. (2002). Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley and IEEE, New York.

Khan, J., Wei, J. S., Ringnér, M., Saal, L. H., Ladanyi, M., Westermann, Frank., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C., and Meltzer, P. S. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature Medicine, 7, 673-679.

Koller, D. and Sahami, M. (1996). Towards optimal feature selection, Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, 284-292.

Lee, K. E., Sha, N., Dougherty, E. R., Vannucci, M., and Mallick, B. K. (2003). Gene selection: a Bayesian variable selection approach, Bioinfromatics, 19, 90-97.

Li, L., Weinberg, R. C., Darden, T. A., and Pedersen, L. G.. (2001). Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method, Bioinformatics, 17, 1131-1142.

Liu, H., Li, J., and Wong, J. (2002). A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns, Genome Informatics, 13, 51-60.

Lu, Y. and Han, J. (2003). Cancer classification using gene expression data, Information Systems, 28, 243-268.

Mitchell, T. M. (1997). Machine Learning, McGraw-Hill.

Nguyen, D. V. and Rocke, D. M. (2002). Tumor classification by partial least squares using microarray gene expression data, Bioinformatics, 18, 39-50.

Park, P., Pagano, M., and Bonetti, M. (2001). A nonparametric scoring algorithm for identifying informative genes from microarray data, Proceedings of the Pacific Symposium on Biocomputing, 6, 52-63.

Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturia, L. M., Angelo, M., McLaughlin, M., Kim, J. Y., Goumnerova, L. C., Black, P. M., Lau, C., Wetmore, C., Biegel, J., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D. N., Mesirov, J. P., Lander, E. S., and Golub, T. R. (2002). Prediction of central nervous system embryonal tumour outcome based on gene expression, Nature, 415, 436-442.

Renaut, R., Hoober, K., Kirkman-Liff, B., Scheck, A. C., and Huynh, J. A., (2004). Evaluation of Gene Selection Using Support Vector Machine Recursive Feature, a report presented in fulfillment of internship requirements of the CBS PSM Degree.

Simek, K., Fujarewicz, K., Swierniak, A., Kimmel, M., Jarzab, B., Wiench, M., and Rzeszowska, J. (2004). Using SVD and SVM methods for selection, classification, clustering and modeling of DNA microarray data, Engineering Application of Artificial Intelligence, 17, 417-427 .

Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A. A., D'Amico, A. V., Richie, J. P., Lander, E. S., Loda, M., Kantoff, P. W., Golub, T. R., and Sellers, W. R. (2002). Gene expression correlates of clinical prostate cancer behavior, Cancer Cell, 1, 203-209.

Tsamardinos, I. and Aliferis, C. F. (2003). Towards principled feature selection: relevancy, filters, and wrappers, in C. M. Bishop and B. J. Frey (eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL.

Vert, J. P. (2001). Introduction to support vector machines and applications to computational biology, Kyoto University, Japan.

Wang, J. N. (2003). A study of multiclass support vector machines, Master Degree Thesis, Department of Information Management, Yuan-Ze University.

Zhang, H., Yu, C., Singer, B., and Xiong, M. (2001). Recursive partitioning for tumor classification with gene expression microarray data, Proceedings of the National Academy Sciences of the United States of America, 98, 6730-6735.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top