跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.152) 您好!臺灣時間:2025/11/02 00:55
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:尤婷藝
研究生(外文):Ting-Yi Yu
論文名稱:混合式進化演算法於微陣列資料分類法則探勘之研究
論文名稱(外文):Using A Hybrid Meta-evolutionary Algorithm for Mining Classification Rules Through Microarray Data
指導教授:陳大正陳大正引用關係
學位類別:碩士
校院名稱:國立虎尾科技大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:中文
論文頁數:110
中文關鍵詞:進化式演算法分類法則資料探勘基因微陣列資料
外文關鍵詞:evolutionary algorithmclassification rulesdata miningmicroarray data
相關次數:
  • 被引用被引用:2
  • 點閱點閱:353
  • 評分評分:
  • 下載下載:5
  • 收藏至我的研究室書目清單書目收藏:0
隨著資訊技術的蓬勃發展,基因微陣列資料成為癌症分類研究中重要的研究領域。由於基因微陣列資料具有高維度基因屬性且樣本數少的特性,因而造成在分類時辨識度低,而且需要冗長的運算時間成本。因此,在面臨基因微陣列資料分類問題時,如何萃取出關鍵且具代表性的基因屬性以獲得較高的分類預測準確率並且有效的提升運算品質成為一項重要的研究議題。因此,本研究提出一個以基因演算法結合二元粒子群最佳化演算法之法則探勘模型,藉由混合式進化演算法在分類問題上進行樣本數值的評量,並同時萃取預測變數、相對應的隸屬函數及參數值之模糊分類法則,透過微陣列資料的屬性維度之調整及隸屬函數的選擇,僅需使用較少量的基因屬性即可達到高準確率的分類。此外,透過模糊法則可觀察出資料屬性與類別之間的相互關係。為了降低分類時所需的龐大運算,本研究結合了格網運算技術來有效降低分析運算所需耗費的時間成本,以建立高預測準確率之分類法則。將實驗結果與過去相關文獻中的方法以及商業探勘軟體進行比較,透過研究顯示本研究所提出的方法能獲得較佳的分類正確率並有效的降低運算的時間。

With the rapid development of information technology, microarray data is an important field of study for cancer research. However, microarray data is with high dimensional attributes and small sample size resulting in lengthy computation time and low classification accuracy. Due to gene microarray data classification issues, how to get more accurate prediction results with better quality becomes an important area of research. This thesis has proposed a hybrid evolutionary algorithm which combines a genetic algorithm and binary particle swarm optimization with fuzzy discriminate function. The proposed method is used to estimate the fitness value for classification, significant variables extraction, and parameters of fuzzy membership function in the meanwhile. Through the adjustment of the dimension of microarray data and the choice of membership function, fewer significantly characteristic attributes can reach high classification accuracy. Fuzzy rules can also be observed through data attributes and the relationship between categories. To reduce the vast computation time for classification process, this study integrates grid computing technology in the proposed approach. The experimental results show our proposed method can achieve higher classification accuracy and effectively reduces the computation time.

中文摘要 .......................................................................................................................i
英文摘要 ..................................................................................................................... ii
致 謝 ........................................................................................................................... iii
目錄 ..............................................................................................................................iv
表目錄..........................................................................................................................vi
圖目錄........................................................................................................................ vii
一、緒論 ......................................................................................................................1
1.1研究背景與動機 ...........................................................................................1
1.2研究範圍與目的 ...........................................................................................2
1.3研究步驟與流程 ...........................................................................................3
1.4論文架構 ........................................................................................................5
二、文獻探討與回顧.................................................................................................6
2.1特徵選擇 ........................................................................................................6
2.2模糊推論系統 ...............................................................................................7
2.3基因演算法 ...................................................................................................8
2.4粒子群最佳化演算法.................................................................................10
2.5二元粒子群最佳化演算法 ........................................................................12
2.6格網運算 ......................................................................................................15
三、研究方法............................................................................................................16
3.1格網運算系統 .............................................................................................16
3.2分類法則探勘步驟 ....................................................................................18
3.3基因編碼的法則表示 ................................................................................22
3.4 適應函數評估 ............................................................................................22
四、實驗數據之分析與討論..................................................................................26
4.1 Iris花類資料集...........................................................................................26
4.1.1資料選取規則(一)...........................................................................27
4.1.1.1參數設定 ...............................................................................27
4.1.1.2實驗分析結果.......................................................................28
4.1.2資料選取規則(二)...........................................................................30
4.1.2.1參數設定 ...............................................................................30
4.1.2.2實驗分析結果.......................................................................31
4.2基因微陣列資料集 ....................................................................................32
4.2.1 結腸癌(Colon)資料集 ...................................................................33
4.2.1.1參數設定 ...............................................................................34
4.2.1.2實驗分析結果.......................................................................35
4.2.1.3型I、型II錯誤分析...........................................................37
4.2.2 慢性骨髓性白血病(CML)資料集 ...............................................38
4.2.2.1參數設定 ...............................................................................38
4.2.2.2實驗分析結果.......................................................................39
4.2.2.3型I、型II錯誤分析...........................................................41
4.3格網效能評測 .............................................................................................44
五、結論與未來研究...............................................................................................46
參考文獻 ....................................................................................................................47
附錄 .............................................................................................................................51
英文論文大綱
簡 歷

[1] Fayyad, U. and Uthurusamy, R., "Data mining and knowledge discovery in databases", Communications of the ACM, 39(11), pp. 24-26, 1996.
[2] Gelatt C. D.and Kirkpatrick S. and Jr., and Vecch M. P. i, "Optimization by simulated annealing ", Science, Vol. 220, pp. 671-680, 1983.
[3] Cheung K. -W.and Kwok, J. T. and Law, M. H., and Tsui, K. -C, "Mining customer productratings for personalized marketing ", Decision Support Systems, Vol. 35, pp. 231-243, 2003.
[4] Ooi. P. Tan, "Genetic algorithms applied to multi -class prediction for the analysis of gene expression data ", Bioinformatics, Vol. 19, pp. 37-44, 2003.
[5] Fred Glover, "Future paths for Integer Programming and Links to Artificial Intelligence ", Computers and Operations Research, Vol.13, pp.533 – 549, 1986.
[6] Parpinelli R. S.and Lopes, H. S., and Freitas, A.A., " An ant colony based system for datamining: applications to medical data ", Proceedings of the 2001 Genetic and Evolutionary Computation Conference, pp. 791-798, 2001.
[7] Haughton D. and Oulabi S., “Direct mar keting modeling with CART and CHAID ", Journal of Direct Marketing, Vol. 11, pp. 42-52, 1997.
[8] Chen T. -C.and Hsu, T. -C., “A GAs based approach for mining breast ca ncer pattern ", Expert Systems with Applications, Vol. 30, pp. 674-681, 2006.
[9] Chen T. -C. and Chen C. -Y., “IAs Based Rule Mining Approach for Satellite –Derived Land-Cover Classification", WSEAS Transactions on Computers, Vol. 5, pp.1345-1353, 2006.
[10] Bentz Y. and Merunka D., " Neural networks and the multinomial logit for brand choice modeling: A hybrid approach ", Journal of Forecasting, Vol. 19, pp. 177-200, 2000.
[11] Kim Y. S. and Street W. N., "An intelligent system for customer targeting: a data mining approach ", Decision Support Systems, Vol. 32, pp. 215-228, 2004.
[12] Kennedy J.and Eberhart R. C, "Particle swarm optimization", Proc. IEEE Int''l. Conf. on Neural Networks (Perth, Australia), Vol. IV, pp. 1942-1948, 1995.
[13] Yu E. and Cho S., "Ensemble based on GA wrapper feature selection", Computers and Industrial Engineering, Vol.51 (1), pp. 111-116, 2006.
[14] Liu R.-S.and Lina T.-C. and Chenc C.-Y.and Chaoa Y.-T.and Chena S.-Y., "Pattern classification in DNA microarray data of multiple tumor types ",Pattern Recognition, Vol. 39, pp. 2426-2438, 2006.
[15] Ooi C. H. and Tan Patrick,"Genetic algorithms applied to multi-class prediction for the analysis of gene expression data", Bioinformatics, Vol. 19, pp. 37-44, 2003.
[16] Li Leping and Weinberg Clarice R.and Darden Thomas A. and Pedersen Lee G. C.,"Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method ", Bioinformatics, Vol. 17, pp. 1131-1142, 2001.
[17] Dun J.-F. and Huang C.-L., "A distributed PSO–SVM hybrid system with feature selection and parameter optimization ", Applied Soft Computing, Vol. 22, 2007.
[18] Huang, C. L. and Wang, C. J., "A Ga-based feature selection and parameters optimization for support vector machines", Expert Systems with Applications, Vol.31, pp. 231-240, 2006.
[19] Peng, S. and Xu, Q. And Ling, X. B. and Peng, X. And Du, W. and Chen L., "Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines", Federation of European Biochemical Societies, Vol.555, pp. 358-362, 2003.
[20] Dash, M. and Liu, H., "Feature selection for classification", Intelligent Data Analysis, Vol.1(1), pp. 131-156, 1997.
[21] Jain, A. K. And Duin, R. P. W. and Mao, J., "Statistical Pattern Recognition: A Review", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.22, pp. 4-37, 2000.
[22] Comak E. and Polat K. and Gune S. and A. Arslan, "A new medical decision making system: Least square support vector machine (LSSVM) with Fuzzy Weighting Pre-processing," Expert Systems with Applications, Vol. 32, pp. 409-414, 2007.
[23] Polat K. and Gune S., "Principles component analysis, fuzzy weighting pre-processing and artificial immune recognition system based diagnostic system for diagnosis of lung cancer," Expert Systems with Applications, Vol. 34, pp. 214-221, 2008.
[24] Zadeh L. A., "Fuzzy sets", Information and Control, Vol.8, pp.338-353, 1965.
[25] Wang L. X., "A Course in Fuzzy Systems and Control". Prentice Hall, 1997.
[26] Egoldberg D., "Engineering Optimization Genetic Algorithm", Ninth Conference on Electronic Computation, pp.471-482, 1986.
[27] Gen M. and Cheng R., "Genetic Algorithms and Engineering Optimization", John-Wiley,2000
[28] Kennedy, J. and Eberhart, R.C., "Particle swarm optimization", Proceedings of the IEEE International Conference on Neural Networks, pp.1942-1948, 1995.
[29] Elbeltagi E. and Hegaz, T. and Grierson D.,"Comparison among five evolutionary-based optimization algorithms", Advanced Engineering Informatics,Vol.19(1), pp. 43-53, 2005.
[30] Eberhart R. and Kennedy J., "A new optimizer using particle swarm theory", Proceedings of the the Sixth International Symposium on Micromachine and Human Science, pp. 39-43, 1995.
[31] Chen M. C. and Tsai D. M., and Tseng H. Y., "A stochastic optimization approach for roundness measurements", Pattern Recognition Letters, Vol.20, pp.707-719, 1999.
[32] Shi Y. and Eberhart R., " A modified particle swarm optimizer", Proceedings of the IEEE International Conference on Evolutionary Computation, pp.69-73, 1998.
[33] Kennedy J. and Eberhart R. C., "A discrete binary version of the particle swarm algorithm", Proceedings of the International Conference on Evolutionary Computation, pp. 4104-4108, Orlando, FL, USA, Oct. 1997.
[34] Mitica Craus and Cristian Aflori, " Grid implementation of the Apriori algorithm ",Advances in Engineering Software , Vol. 39, pp. 295–300, 2007.
[35] Chen T.-C. and Tsao H.-L., " Using a hybrid meta -evolutionary rule mining approach as a classification response model ", Expert Systems with Applications , Vol. 40, 2008.
[36] Newman D. J. and Hettich S. and Blake C. L. and Merz, C. J., UCI Repository of machine learning databases, Available from http://archive.ics.uci.edu/ml/, 1998.
[37] Fisher R. A., "The use of multiple measurements in taxonomic problems", Annals of Eugenics, Vol.7, pp. 179-188, 1936.
[38] Michael Marshall, Iris Plants Database, Available from http://archive.ics.uci.edu/ml/datasets/Iris,1988
[39] 曹惠鈴,"運用混合式進化演算法於法則探勘之回應模型",國立虎尾科技大學資訊管理系碩士論文,2007。
[40] Tsai Y. -C. and Cheng C. -H. and Chang, J. -R., "Entropy-based fuzzy rough classification approach for extracting classification rules", Expert Systems with Applications, Vol.31(2), pp. 436-443, 2006.
[41] Chen S.-M. and Tsai F.-M., " Generating fuzzy rules from training instances for fuzzy classification systems ", Expert Systems with Applications, Vol.35, pp. 661-621, 2008.
[42] Wu, T. -P. and Chen, S. -M., "A new method for constructing membership functions and fuzzy rules from training examples", Systems, Man and Cybernetics, Part B, IEEE Transactions on, Vol.29(1), pp. 25-40, 1999.
[43] Hong, T. -P. and Lee, C. -Y., "Induction of fuzzy rules and membership functions from training examples", Fuzzy Sets and Systems,Vol.84(1), pp. 33-47, 1996.
[44] 江泰緯,"以網格運算為基礎進化式演算法於資料探勘分類反應模型建立之研究",國立虎尾科技大學資訊管理系碩士論文,2008。
[45] Alon U. et al.," Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. " Proc. Natl Acad. Sci. USA, 96, 6745–6750,1998.
[46] Naiman D.Q. and Tan A.C.and Xu L.and Winslow R.L.and Geman D., " Simple decision rules for classifying human cancers from gene expression profiles ", Bioinformatics, Vol. 21, pp.3896–3904, 2005.
[47] Yoon Sejong and Kim Saejoon ," k-Top Socring Pair Algorithm for feature selection in SVM with application to microarray data classification",Soft Comput Vol.14,pp.151–159,2010.
[48] Liu Bing and Cui Qinghua and Jiang Tianzi and Ma Songde, " A combination feature selection and ensemble neural network method for classification of gene expression data",BMC Bioinformatics,Vol.5(126),2004.
[49] Marcel Dettling and Peter B‥uhlmann," Boosting for tumor classification with gene expression data",Bioinformatics, Vol.19(9), pp.1061–1069,2003.
[50] A.I.LAB, Ljubljana, Available from http://www.ailab.si/supp/bi-cancer/projections/info/CMLGSE2535.htm

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top