臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.213) 您好！臺灣時間：2025/11/11 08:49

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

李建邦

研究生(外文):

Chien-Pang Lee

論文名稱:

利用適應性遺傳演算法結合k最近鄰法於基因表現資料進行基因篩選與樣本分類之研究

論文名稱(外文):

The Study on Gene Selection and Sample Classification Based on Gene Expression Data Using Adaptive Genetic Algorithms / k-Nearest Neighbors Method

指導教授:

郭寶錚

指導教授(外文):

Bo-Jein Kuo

學位類別:

碩士

校院名稱:

國立中興大學

系所名稱:

農藝學系所

學門:

農業科學學門

學類:

一般農業學類

論文種類:

學術論文

論文出版年:

2006

畢業學年度:

語文別:

中文

論文頁數:

100

中文關鍵詞:

微陣列、基因篩選、遺傳演算法、k最近鄰法

外文關鍵詞:

Microarray、Gene Selection、Genetic Algorithms、k-Nearest Neighbors

相關次數:

被引用:4
點閱:343
評分:
下載:0
書目收藏:0

近年來微陣列技術（microarray technology）已成為研究基因表現的重要工具之一。而和過去研究基因表現方法最大的差異在於，微陣列技術可同時偵測數以千計基因的表現情形。過去研究者常藉母數統計方法試圖搜尋出具有顯著差異表現的基因，但微陣列資料常無法滿足母數統計方法中的前提假設，且對每一基因單獨進行顯著性的檢定時，可能會遇到第一型錯誤（type I error）過度膨脹的問題。因此為了解決上述的問題，本研究期望能利用不須前提假設的變數選取方法來降低基因維度，再經由生物學家尋找出真正重要的基因。

本研究根據Li et al.（2001a）所提出的遺傳演算法（genetic algorithms；GA）結合k最近鄰法（k-nearest neighbors；KNN）（GA / KNN法）為基礎並進行修正，另行利用適應性遺傳演算法（adaptive genetic algorithms；AGA）結合k最近鄰法（AGA / KNN法）縮減微陣列資料的範圍。AGA與KNN都是發展已久的方法，但本研究卻是首度結合兩者以應用於分析微陣列資料。由於適應性遺傳演算法是一種機器學習的搜尋工具而k最近鄰法為無母數的判別分析方法，因此在使用時並不須受限於一些前提假設。AGA / KNN法和GA / KNN法最大的不同處為：編碼方式改為二進位編碼，每一數串包含了全部的基因，並加入機率適應性交換及突變機制（adaptive probabilities of crossover and mutation）與汰換政策（extinction and immigration strategy）。由於遺傳演算法的特色為能找出近似最適解，而每次所產生的最佳數串通常都不一樣。因此AGA / KNN法利用重複抽樣的觀念，反覆執行AGA / KNN法，先找出眾多表現較佳的數串，再根據基因累積出現次數進行排序，依此排序對基因進行縮減。

本研究中先以AGA / KNN法應用於Alon et al.（1999）對結腸癌病患所進行的寡核苷酸晶片資料進行實例研究。再以Callow et al.（2000）研究老鼠apo AI基因之cDNA晶片資料比較AGA / KNN法與GA / KNN法選取基因的能力。根據分析結果，AGA / KNN與GA / KNN都能縮減基因的範圍，並可對樣本進行正確分類。但AGA / KNN法正確篩選基因的能力優於GA / KNN法，且AGA / KNN法所消耗CPU時間僅有GA / KNN法的一半。因此若依據本研究所分析的資料，AGA / KNN法整體表現應不劣於GA / KNN法。最後經本研究的分析與比較後，建議當使用AGA / KNN法時，先大約進行100次試行（runs）後，再選取累積出現次數最高的前50至前100個基因，所搜尋出的基因應可涵蓋重要的基因，並可對樣本進行正確的分類。

Microarray technology has become a valuable tool for studying gene expression in recent years. The main difference between microarray and traditional methods is that microarray can measure thousands of genes at the same time. In the past, researchers always used parametric statistical methods to find the significant genes. However, microarray data often cannot obey some assumptions of parametric statistical methods, and type I error would be over expanded while each gene was tested for significance. Therefore, this research was expected to find a variable selection method without assumptions restriction to reduce the dimension of the data set. After using the proposed method, biologists can select the relevant genes according to the sub-gene set.

In this study, adaptive genetic algorithms / k-nearest neighbors (AGA / KNN) was used to reduce the dimension of the data set, and it was based on genetic algorithms / k-nearest neighbors (GA / KNN) which was first described by Li et al.(2001a). Although AGA and KNN were well-developed, AGA / KNN was first used to analyze the microarray data. Since AGA was a machine learning tool and KNN was a nonparametric discrimination analysis, both of them could be used without assumptions restriction. There are three main differences between AGA/KNN and GA / KNN. Firstly, the encoding has become binary code, and each string included all genes. Secondly, the adaptive probabilities of crossover and mutation were added. Finally, the extinction and immigration strategy was added. Since GA can just find the near optimal solution, the best string of each run is often not the same. Here, AGA / KNN was repeated by many runs to solve that problem. Thus, lots of the best strings were saved. The frequency of gene was computed by those strings to reduce the dimension of the data set.

In this study, an original colon data which is a high-density oligonucleotide chip (Alon et al., 1999) was analyzed. In addition, mice apo AI data which is a cDNA chip (Callow et al., 2000) was also used to compare the ability of gene selection of AGA / KNN and GA / KNN. Based on the results, it was found that AGA / KNN and GA / KNN could reduce the dimension of the data set and all samples could be classified correctly. But the accuracy of AGA / KNN was higher than that of GA / KNN, and it only took half CPU time of GA / KNN. Therefore, it was claimed that the performance of AGA / KNN should not be worse than that of GA / KNN. Finally, we suggested that when AGA / KNN was employed to analyze the microarray data, the top 50 and up to 100 most frequent genes were selected after AGA / KNN were repeated about 100 runs. Those selected genes should include relevant genes, and those selected genes could classify sample correctly.

目次
頁次
中文摘要 Ι
Abstract ΙΙΙ
第一章緒言 1
第二章文獻回顧 3
一、前言 3
二、生物晶片簡介 4
（一） cDNA晶片製作流程 4
（二）影像分析 7
（三）正規化 11
三、生物晶片之統計分析方法 16
（一）類別發現 17
（二）類別比較 20
（三）類別預測 25
四、遺傳演算法 27
（一）遺傳演算法之流程 28
（二）適應性遺傳演算法 33
五、 GA / KNN法 36
第三章 AGA / KNN法 48
一、 AGA / KNN法之流程 48
（一）編碼 48
（二）適合度函數 49
（三）菁英政策 50
（四）機率適應性交換及突變機制 50
（五）汰換政策 50
（六）停止標準 51
二、 AGA / KNN法之實例研究 54
（一）資料介紹 54
（二）分析過程 54
三、 AGA / KNN法與GA / KNN法之比較 66
（一）結腸癌病患的寡核苷酸晶片之資料 66
（二）老鼠apo AI基因之cDNA晶片資料 71
四、 AGA / KNN法之參數探討 79
（一）適合度函數臨界值之決定 79
（二）試行數與基因數之決定 80
第四章綜合討論 85
第五章參考文獻 88
附錄 93

林汶鑫、郭寶錚、何兆銓。2003。遺傳演算法的理論與應用。科學農業51(7,8) : 186-195。
林汶鑫。2003。遺傳演算法應用在雙線性模式之研究。碩士論文。台中。國立中興大學農藝學系碩士班。
李欣怡。2005。Affymetrix高密度寡具核苷酸晶片試驗統計分析方法之比較。碩士論文。台北。國立台灣大學農藝研究所生物統計組。
馬立人、蔣中華。2003。生物晶片二版。九州圖書文物有限公司。
陳奕仁。2001。適應性基因演算法結合精英政策於線性馬達定位機台之主動式振動控制器設計。碩士論文。高雄。國立中山大學機械工程研究所。
陳智豪。2002。cDNA生物晶片資料的統計分析方法之研究。碩士論文。台北。國立台灣大學農藝研究所生物統計組。
盧建銘。2004。不同變數選取法在近紅外光光譜資料上之比較。碩士論文。台中。國立中興大學農藝學系碩士班。
Alizadeh, A. A., M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. Hudson Jr, L. Lu, D. B. Lewis, R. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, H. C. Byrd, D. Botstein, P. O. Brown and L. M. Staudt. 2000. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769): 503 - 511.
Alon, U., N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack and A. J. Levine. 1999. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probe by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96: 6745 - 6750.
Bozinov, D. and J. Rahnenführer. 2002. Unsupervised technique for robust target separation and analysis of DNA microarray spots through adaptive pixel clustering. Bioinformatics 18(5): 747 - 756.
Callow, M. J., S. Dudoit, E. L. Gong, T. P. Speed and E. M. Rubin. 2000. Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome Res. 10: 2022 - 2029.
Chen, Y., E. R. Dougherty and M. L. Bittner. 1997. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Opt. 2(4): 364 - 374.
Cleveland, W. S. 1979. Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assoc. 74: 829 - 836.
De Jong, K. A. 1975. An analysis of the behavior of a class of genetic adaptive system. Ph. D. Dissertation, Department of Computer and Communication Sciences. University of Michigan.
Dobbin, K. and R. Simon. 2002.Comparison of microarray designs for class comparison and class discovery. Bioinf. 18: 1438-1445.
Duggan, D. J., M. Bittner, Y. Chen, P. Meltzer and J. Trent. 1999. Expression profiling using cDNA microarrays. Nature Genet.
21: 10-14.
Dudoit, S., Y. H. Yang, M. J. Callow and T. P. Speed. 2002. Statistical method for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 12: 111 - 139.
Efron, B., R. Tibshirani, V. Goss and G. Chu. 2000. Microarrays and their use in a comparative experiment. Tech. report. Department of Biochemistry, Stanford University.
Enas, G. G. and S. C. Choi. 1986. Choice of the smoothing parameter and efficiency of k-nearest neighbor classification. Comp. & Maths. with Appls. 12A(2): 235 - 244.
Fix, E. and J. L. Hodges, 1951. Nonparametric discrimination: consistency properties. Project No. 21-49-004. Report 4. U.S. Air Force School of Aviation Medicine. Randolph Field. TX.
Goldberg, D. E. 1985. Optimal initial population size for binary-coded genetic algorithms (TCGA report No.85001) The clearinghouse for genetic algorithms, department of engineering mechanics. Tuscaloosa: University of Alabama.
Goldberg, D. E. 1989. Genetic algorithms in search, optimization, and machine learning. Addison-Wesley Publishing Company.
Gottardo, R., J. Besag, M. Stephens and A. Murua. 2006. Probabilistic segmentation and intensity estimation for microarray images. Biostatistics 7(1): 85 - 99.
Holland, J. H. 1975. Adaptation in natural and artificial systems. Ann Arbor: The University of Michigan Press.
Ho, S. Y., C. C. Liu and S. Liu. 2002. Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm. Pattern Recogn. Lett. 23: 1495 - 1503.
Joachims, T. 1998. Text categorization with support vector machines: learning with many relevant features. Proceedings of the European Conference on Machine Learning. Berlin. 37-142.
Kerr, N. K., M. Martin and G. A. Churchill. 2000. Analysis of variance for gene expression microarray data. J. Comput. Biol. 7: 819-837.
Kerr, M. K. and G. A. Churchill. 2001. Statistical design and the analysis of gene expression microarray data. Genet. Res. 77: 123-128.
Kuncheva, L. I. 1995. Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recogn. Lett. 16: 809 - 814.
Kuncheva, L. I. and J. C. Bezdek. 1998. Nearest prototype classification: clustering, genetic algorithms or random search. IEEE Trans. Systems Man Cybernet. C. 28(1): 160-164.
Kuncheva, L. I. and L. C. Jain. 1999. Nearest neighbor classifier: Simultaneous editing and feature selection. Pattern Recogn. Lett. 20: 1149-1156.
Lee, M. L. T., 2004. Analysis of microarray gene expression data. New York. Kluwer Academic Publishers.
Li, L., T. A. Darden, C. R. Weinberg, A. J. Levine and L. G. Pedersen. 2001a. Gene Assessment and Sample Classification for Gene Expression Data Using a Genetic Algorithm / k-Nearest Neighbor Method. Comb. Chem. High Throughput Screen. 4: 727 - 739.
Li, L., C. R. Weinberg, T. A. Darden and L. G. Pedersen. 2001b. Gene selection for sample classification based on gene expression data : study of sensitivity to choice of parameters of the GA / KNN method. Bioinformatics 17(12): 1131 - 1142.
Li, L., D. M. Umbach, P. Terry and J. A. Taylor. 2004. Application of the GA/KNN method to SELDI proteomics data. Bioinformatics 20(10): 1638 - 1640.
Liu, D., T. Shi, J. A. DiDonato, J. D. Carpten, J. Zhu and Z. H. Duan. 2004. Application of genetic algorithm/k-nearest neighbor method to the classification of renal cell carcinoma. 2004 IEEE Computational Systems Bioinformatics Conference (CSB''04): 558 - 559.
Lönnstedt, I and T. Speed. 2002. Replicated mircoarray data. Statistica Sinica 12: 31 - 46.
Raymer, M. L., W. F. Punch, E. D. Goodman, P. C. Sanschagrin and L. A. Kuhn. 1997. Simultaneous Feature Extraction and Selection Using a Masking Genetic Algorithm. In Proceedings of ICGA-97: 561 - 567.
Ruan, W., T. C. Giras, Z. Lin and Y. Ou. 2003. ASCAP parameter determination by an intelligent genetic algorithm. Proceedings of the 2003 IEEE/ASME Joint Rail Conference: 133 - 141.
Shannon, W., R. Culverhouse and J. Duncan. 2003. Analyzing microarray data using cluster analysis. Pharmacogenomics 4(1): 41 - 51.
Simon, R. M., E. L. Korn, L. M. McShane, M. D. Radmacher, G. W. Wright and Y. Zhao. 2003. Design and Analysis of DNA Microarray Investigations. New York. Springer.
Smyth, G. K. and T. P. Speed. 2003. Normalization of cDNA microarray data. Methods 31: 265 - 273.
Srinivas, M. and L. M. Patnaik. 1994. Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Transactions on System, Man and Cybernetics 24(4): 656 - 666.
Tsai, C. A., H. M. Hsueh and J. J. Chen. 2004. A generalized additive model for microarray gene expression data analysis. J Biopharm Stat. 14(3): 553-573.
Westfall, P. H. and S. S. Young. 1993. Resampling-based multiple testing: Examples and methods for p-value adjustment. New York: Wiley.

Yang, Y. and X. Liu. 1999.A re-examination of text categorization method. Proceedings of the 22nd Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval: 42-49
Yang, Y. H., S. Dudoit, P. Luu, D. M. Lin, V. Peng, J. Ngai and T. P. Speed. 2002. Normalization for cDNA microarray data : a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30(4): e15.1 - e15.10.
Yao, L. M. IEEE and W. A. Sethares. 1994. Nonlinear parameter estimation via the genetic algorithm. IEEE transactions on signal processing 42(4): 927-935

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	不同變數選取法在近紅外光光譜資料上之比較
2.	遺傳演算法應用在雙線性模式之研究
3.	Affymetrix高密度寡聚核苷酸晶片試驗統計分析方法之比較
4.	cDNA生物晶片資料的統計分析方法之研究
5.	以蜂群最佳化演算法結合支援向量迴歸預測農業產出之研究
6.	修正遺傳演算法結合k最近鄰法應用在微陣列試驗之再現性研究
7.	小樣本高維度資料中二階段分類法之效能評估-以基因微陣列資料癌症分類為例
8.	應用螞蟻演算法於基因篩選—以癌症分類為例
9.	針對生物微晶片資料利用決策樹選取關鍵基因
10.	結合資訊增益與基因演算法的微陣列基因篩選方法
11.	微晶片在SVM與SOM上的分析
12.	用於微陣列資料分析的最佳分類器設計
13.	使用遺傳演算法在基因表現資料上辨識有意義的轉錄模組
14.	在微陣列資料上利用基因分群以減少冗贅之基因選取方法
15.	利用微陣列晶片資料於癌症樣本分類與功能性類別學習之研究

1.	10. 宋皇志，從MGM Studios, Inc. v. Grokster, Ltd.案看P2P業者之侵權責任，科技法學評論，2卷(2005)。
2.	9. 章忠信，網站服務業者之著作權侵害責任，萬國法律，1998。
3.	8. 盧映潔，論危險前行為的成立要件－以德國聯邦最高法院判決見解為說明，月旦法學雜誌第78期，2001/11月。
4.	7. 張雅雯，網際網路連線服務提供者就網路違法內容之法律責任，資訊法務透析，1998，3月；5月。
5.	5. 黃惠婷，幫助犯之幫助行為－兼探討網路服務提供者之刑責，中原財經法學第五期。
6.	4. 蔡蕙芳，用戶著作權侵權之網路服務業者責任，國科會專題補助研究計畫；同時發表於科技法學評論，93.10。
7.	1. 鄭中人，著作權法的經濟分析，月旦法學雜誌，第15期，1996/8。
8.	3. 劉孔中，著作權法有關技術保護措施規定之研究，月旦法學雜誌，119期，2005年4月。
9.	11. 陳家駿，從網路電子交易評我國首宗P2P著作權重製與傳輸之ezPeer案判決，月旦法學雜誌130期；對網路P2P業者著作權補償金制度提議之意見，智慧財產權月刊64期，93.04。

1.	在微陣列資料中進行特徵選取與預測基因調控網路之研究
2.	影響短期利率的主要因素及其效應性分析
3.	百香果貯藏技術之研究
4.	運用三維之微笑曲線思維架構突破代工與品牌經營的兩難-以機械廠商為例
5.	創新人格特質與創新歷程(發明家的故事)
6.	金融控股公司及其主要子公司之價值評估與績效分析研究-以台灣14家金控為例
7.	兒童少年福利組織與政府互動關係之探討
8.	休閒農場設施法令規範之研究-以台一休閒農場為例
9.	以固態發酵製備樟芝米及其品質與抗氧化性質
10.	以生物濾床法處理臭味物質之研究
11.	崩塌地降雨-入滲-滲流機制之數值模擬及穩定性分析
12.	不連續變形分析法應用於礫石型土石流運動之研究
13.	遺傳演算法應用在雙線性模式之研究
14.	客服中心顧客滿意度暨委外人員工作認同之研究－以C電信公司客服中心為例
15.	植物病害發生率中解決過度離勢方法之研究

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室