跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.213) 您好!臺灣時間:2025/11/11 08:49
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:李建邦
研究生(外文):Chien-Pang Lee
論文名稱:利用適應性遺傳演算法結合k最近鄰法於基因表現資料進行基因篩選與樣本分類之研究
論文名稱(外文):The Study on Gene Selection and Sample Classification Based on Gene Expression Data Using Adaptive Genetic Algorithms / k-Nearest Neighbors Method
指導教授:郭寶錚郭寶錚引用關係
指導教授(外文):Bo-Jein Kuo
學位類別:碩士
校院名稱:國立中興大學
系所名稱:農藝學系所
學門:農業科學學門
學類:一般農業學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:中文
論文頁數:100
中文關鍵詞:微陣列基因篩選遺傳演算法k最近鄰法
外文關鍵詞:MicroarrayGene SelectionGenetic Algorithmsk-Nearest Neighbors
相關次數:
  • 被引用被引用:4
  • 點閱點閱:343
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來微陣列技術(microarray technology)已成為研究基因表現的重要工具之一。而和過去研究基因表現方法最大的差異在於,微陣列技術可同時偵測數以千計基因的表現情形。過去研究者常藉母數統計方法試圖搜尋出具有顯著差異表現的基因,但微陣列資料常無法滿足母數統計方法中的前提假設,且對每一基因單獨進行顯著性的檢定時,可能會遇到第一型錯誤(type I error)過度膨脹的問題。因此為了解決上述的問題,本研究期望能利用不須前提假設的變數選取方法來降低基因維度,再經由生物學家尋找出真正重要的基因。

本研究根據Li et al.(2001a)所提出的遺傳演算法(genetic algorithms;GA)結合k最近鄰法(k-nearest neighbors;KNN)(GA / KNN法)為基礎並進行修正,另行利用適應性遺傳演算法(adaptive genetic algorithms;AGA)結合k最近鄰法(AGA / KNN法)縮減微陣列資料的範圍。AGA與KNN都是發展已久的方法,但本研究卻是首度結合兩者以應用於分析微陣列資料。由於適應性遺傳演算法是一種機器學習的搜尋工具而k最近鄰法為無母數的判別分析方法,因此在使用時並不須受限於一些前提假設。AGA / KNN法和GA / KNN法最大的不同處為:編碼方式改為二進位編碼,每一數串包含了全部的基因,並加入機率適應性交換及突變機制(adaptive probabilities of crossover and mutation)與汰換政策(extinction and immigration strategy)。由於遺傳演算法的特色為能找出近似最適解,而每次所產生的最佳數串通常都不一樣。因此AGA / KNN法利用重複抽樣的觀念,反覆執行AGA / KNN法,先找出眾多表現較佳的數串,再根據基因累積出現次數進行排序,依此排序對基因進行縮減。

本研究中先以AGA / KNN法應用於Alon et al.(1999)對結腸癌病患所進行的寡核苷酸晶片資料進行實例研究。再以Callow et al.(2000)研究老鼠apo AI基因之cDNA晶片資料比較AGA / KNN法與GA / KNN法選取基因的能力。根據分析結果,AGA / KNN與GA / KNN都能縮減基因的範圍,並可對樣本進行正確分類。但AGA / KNN法正確篩選基因的能力優於GA / KNN法,且AGA / KNN法所消耗CPU時間僅有GA / KNN法的一半。因此若依據本研究所分析的資料,AGA / KNN法整體表現應不劣於GA / KNN法。最後經本研究的分析與比較後,建議當使用AGA / KNN法時,先大約進行100次試行(runs)後,再選取累積出現次數最高的前50至前100個基因,所搜尋出的基因應可涵蓋重要的基因,並可對樣本進行正確的分類。
Microarray technology has become a valuable tool for studying gene expression in recent years. The main difference between microarray and traditional methods is that microarray can measure thousands of genes at the same time. In the past, researchers always used parametric statistical methods to find the significant genes. However, microarray data often cannot obey some assumptions of parametric statistical methods, and type I error would be over expanded while each gene was tested for significance. Therefore, this research was expected to find a variable selection method without assumptions restriction to reduce the dimension of the data set. After using the proposed method, biologists can select the relevant genes according to the sub-gene set.

In this study, adaptive genetic algorithms / k-nearest neighbors (AGA / KNN) was used to reduce the dimension of the data set, and it was based on genetic algorithms / k-nearest neighbors (GA / KNN) which was first described by Li et al.(2001a). Although AGA and KNN were well-developed, AGA / KNN was first used to analyze the microarray data. Since AGA was a machine learning tool and KNN was a nonparametric discrimination analysis, both of them could be used without assumptions restriction. There are three main differences between AGA/KNN and GA / KNN. Firstly, the encoding has become binary code, and each string included all genes. Secondly, the adaptive probabilities of crossover and mutation were added. Finally, the extinction and immigration strategy was added. Since GA can just find the near optimal solution, the best string of each run is often not the same. Here, AGA / KNN was repeated by many runs to solve that problem. Thus, lots of the best strings were saved. The frequency of gene was computed by those strings to reduce the dimension of the data set.

In this study, an original colon data which is a high-density oligonucleotide chip (Alon et al., 1999) was analyzed. In addition, mice apo AI data which is a cDNA chip (Callow et al., 2000) was also used to compare the ability of gene selection of AGA / KNN and GA / KNN. Based on the results, it was found that AGA / KNN and GA / KNN could reduce the dimension of the data set and all samples could be classified correctly. But the accuracy of AGA / KNN was higher than that of GA / KNN, and it only took half CPU time of GA / KNN. Therefore, it was claimed that the performance of AGA / KNN should not be worse than that of GA / KNN. Finally, we suggested that when AGA / KNN was employed to analyze the microarray data, the top 50 and up to 100 most frequent genes were selected after AGA / KNN were repeated about 100 runs. Those selected genes should include relevant genes, and those selected genes could classify sample correctly.
目次
頁次
中文摘要 Ι
Abstract ΙΙΙ
第一章 緒言 1
第二章 文獻回顧 3
一、 前言 3
二、 生物晶片簡介 4
(一) cDNA晶片製作流程 4
(二) 影像分析 7
(三) 正規化 11
三、 生物晶片之統計分析方法 16
(一) 類別發現 17
(二) 類別比較 20
(三) 類別預測 25
四、 遺傳演算法 27
(一) 遺傳演算法之流程 28
(二) 適應性遺傳演算法 33
五、 GA / KNN法 36
第三章 AGA / KNN法 48
一、 AGA / KNN法之流程 48
(一) 編碼 48
(二) 適合度函數 49
(三) 菁英政策 50
(四) 機率適應性交換及突變機制 50
(五) 汰換政策 50
(六) 停止標準 51
二、 AGA / KNN法之實例研究 54
(一) 資料介紹 54
(二) 分析過程 54
三、 AGA / KNN法與GA / KNN法之比較 66
(一) 結腸癌病患的寡核苷酸晶片之資料 66
(二) 老鼠apo AI基因之cDNA晶片資料 71
四、 AGA / KNN法之參數探討 79
(一) 適合度函數臨界值之決定 79
(二) 試行數與基因數之決定 80
第四章 綜合討論 85
第五章 參考文獻 88
附錄 93
林汶鑫、郭寶錚、何兆銓。2003。遺傳演算法的理論與應用。科學農業51(7,8) : 186-195。
林汶鑫。2003。遺傳演算法應用在雙線性模式之研究。碩士論文。台中。國立中興大學農藝學系碩士班。
李欣怡。2005。Affymetrix高密度寡具核苷酸晶片試驗統計分析方法之比較。碩士論文。台北。國立台灣大學農藝研究所生物統計組。
馬立人、蔣中華。2003。生物晶片 二版。九州圖書文物有限公司。
陳奕仁。2001。適應性基因演算法結合精英政策於線性馬達定位機台之主動式振動控制器設計。碩士論文。高雄。國立中山大學機械工程研究所。
陳智豪。2002。cDNA生物晶片資料的統計分析方法之研究。碩士論文。台北。國立台灣大學農藝研究所生物統計組。
盧建銘。2004。不同變數選取法在近紅外光光譜資料上之比較。碩士論文。台中。國立中興大學農藝學系碩士班。
Alizadeh, A. A., M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. Hudson Jr, L. Lu, D. B. Lewis, R. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, H. C. Byrd, D. Botstein, P. O. Brown and L. M. Staudt. 2000. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769): 503 - 511.
Alon, U., N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack and A. J. Levine. 1999. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probe by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96: 6745 - 6750.
Bozinov, D. and J. Rahnenführer. 2002. Unsupervised technique for robust target separation and analysis of DNA microarray spots through adaptive pixel clustering. Bioinformatics 18(5): 747 - 756.
Callow, M. J., S. Dudoit, E. L. Gong, T. P. Speed and E. M. Rubin. 2000. Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome Res. 10: 2022 - 2029.
Chen, Y., E. R. Dougherty and M. L. Bittner. 1997. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Opt. 2(4): 364 - 374.
Cleveland, W. S. 1979. Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assoc. 74: 829 - 836.
De Jong, K. A. 1975. An analysis of the behavior of a class of genetic adaptive system. Ph. D. Dissertation, Department of Computer and Communication Sciences. University of Michigan.
Dobbin, K. and R. Simon. 2002.Comparison of microarray designs for class comparison and class discovery. Bioinf. 18: 1438-1445.
Duggan, D. J., M. Bittner, Y. Chen, P. Meltzer and J. Trent. 1999. Expression profiling using cDNA microarrays. Nature Genet.
21: 10-14.
Dudoit, S., Y. H. Yang, M. J. Callow and T. P. Speed. 2002. Statistical method for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 12: 111 - 139.
Efron, B., R. Tibshirani, V. Goss and G. Chu. 2000. Microarrays and their use in a comparative experiment. Tech. report. Department of Biochemistry, Stanford University.
Enas, G. G. and S. C. Choi. 1986. Choice of the smoothing parameter and efficiency of k-nearest neighbor classification. Comp. & Maths. with Appls. 12A(2): 235 - 244.
Fix, E. and J. L. Hodges, 1951. Nonparametric discrimination: consistency properties. Project No. 21-49-004. Report 4. U.S. Air Force School of Aviation Medicine. Randolph Field. TX.
Goldberg, D. E. 1985. Optimal initial population size for binary-coded genetic algorithms (TCGA report No.85001) The clearinghouse for genetic algorithms, department of engineering mechanics. Tuscaloosa: University of Alabama.
Goldberg, D. E. 1989. Genetic algorithms in search, optimization, and machine learning. Addison-Wesley Publishing Company.
Gottardo, R., J. Besag, M. Stephens and A. Murua. 2006. Probabilistic segmentation and intensity estimation for microarray images. Biostatistics 7(1): 85 - 99.
Holland, J. H. 1975. Adaptation in natural and artificial systems. Ann Arbor: The University of Michigan Press.
Ho, S. Y., C. C. Liu and S. Liu. 2002. Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm. Pattern Recogn. Lett. 23: 1495 - 1503.
Joachims, T. 1998. Text categorization with support vector machines: learning with many relevant features. Proceedings of the European Conference on Machine Learning. Berlin. 37-142.
Kerr, N. K., M. Martin and G. A. Churchill. 2000. Analysis of variance for gene expression microarray data. J. Comput. Biol. 7: 819-837.
Kerr, M. K. and G. A. Churchill. 2001. Statistical design and the analysis of gene expression microarray data. Genet. Res. 77: 123-128.
Kuncheva, L. I. 1995. Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recogn. Lett. 16: 809 - 814.
Kuncheva, L. I. and J. C. Bezdek. 1998. Nearest prototype classification: clustering, genetic algorithms or random search. IEEE Trans. Systems Man Cybernet. C. 28(1): 160-164.
Kuncheva, L. I. and L. C. Jain. 1999. Nearest neighbor classifier: Simultaneous editing and feature selection. Pattern Recogn. Lett. 20: 1149-1156.
Lee, M. L. T., 2004. Analysis of microarray gene expression data. New York. Kluwer Academic Publishers.
Li, L., T. A. Darden, C. R. Weinberg, A. J. Levine and L. G. Pedersen. 2001a. Gene Assessment and Sample Classification for Gene Expression Data Using a Genetic Algorithm / k-Nearest Neighbor Method. Comb. Chem. High Throughput Screen. 4: 727 - 739.
Li, L., C. R. Weinberg, T. A. Darden and L. G. Pedersen. 2001b. Gene selection for sample classification based on gene expression data : study of sensitivity to choice of parameters of the GA / KNN method. Bioinformatics 17(12): 1131 - 1142.
Li, L., D. M. Umbach, P. Terry and J. A. Taylor. 2004. Application of the GA/KNN method to SELDI proteomics data. Bioinformatics 20(10): 1638 - 1640.
Liu, D., T. Shi, J. A. DiDonato, J. D. Carpten, J. Zhu and Z. H. Duan. 2004. Application of genetic algorithm/k-nearest neighbor method to the classification of renal cell carcinoma. 2004 IEEE Computational Systems Bioinformatics Conference (CSB''04): 558 - 559.
Lönnstedt, I and T. Speed. 2002. Replicated mircoarray data. Statistica Sinica 12: 31 - 46.
Raymer, M. L., W. F. Punch, E. D. Goodman, P. C. Sanschagrin and L. A. Kuhn. 1997. Simultaneous Feature Extraction and Selection Using a Masking Genetic Algorithm. In Proceedings of ICGA-97: 561 - 567.
Ruan, W., T. C. Giras, Z. Lin and Y. Ou. 2003. ASCAP parameter determination by an intelligent genetic algorithm. Proceedings of the 2003 IEEE/ASME Joint Rail Conference: 133 - 141.
Shannon, W., R. Culverhouse and J. Duncan. 2003. Analyzing microarray data using cluster analysis. Pharmacogenomics 4(1): 41 - 51.
Simon, R. M., E. L. Korn, L. M. McShane, M. D. Radmacher, G. W. Wright and Y. Zhao. 2003. Design and Analysis of DNA Microarray Investigations. New York. Springer.
Smyth, G. K. and T. P. Speed. 2003. Normalization of cDNA microarray data. Methods 31: 265 - 273.
Srinivas, M. and L. M. Patnaik. 1994. Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Transactions on System, Man and Cybernetics 24(4): 656 - 666.
Tsai, C. A., H. M. Hsueh and J. J. Chen. 2004. A generalized additive model for microarray gene expression data analysis. J Biopharm Stat. 14(3): 553-573.
Westfall, P. H. and S. S. Young. 1993. Resampling-based multiple testing: Examples and methods for p-value adjustment. New York: Wiley.

Yang, Y. and X. Liu. 1999.A re-examination of text categorization method. Proceedings of the 22nd Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval: 42-49
Yang, Y. H., S. Dudoit, P. Luu, D. M. Lin, V. Peng, J. Ngai and T. P. Speed. 2002. Normalization for cDNA microarray data : a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30(4): e15.1 - e15.10.
Yao, L. M. IEEE and W. A. Sethares. 1994. Nonlinear parameter estimation via the genetic algorithm. IEEE transactions on signal processing 42(4): 927-935
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top