跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.208) 您好!臺灣時間:2025/10/02 07:00
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:劉冠良
研究生(外文):Kuan-Liang Liu
論文名稱:以叢集分析與距離測度為基礎之基因選取法
論文名稱(外文):A Subset gene selection method based on clustering analysis and distance measure
指導教授:翁慈宗翁慈宗引用關係
指導教授(外文):Tzu-Tsung Wong
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:中文
論文頁數:61
中文關鍵詞:基因選取基因微陣列距離測度計量值癌症分類叢集分析
外文關鍵詞:gene selectiondistance methodgene microarraytumor classificationclustering
相關次數:
  • 被引用被引用:6
  • 點閱點閱:361
  • 評分評分:
  • 下載下載:50
  • 收藏至我的研究室書目清單書目收藏:1
人類基因體計畫完成後,生物學家的下一個課題便是了解數萬個基因所代表的意義與其相互關係。基因微陣列技術能將所有的基因表現資料儲存在一個微小的晶片中,相關研究人員因此能夠同時對所有的基因資料進行分析,然而相較於一般統計資料型態,基因微陣列資料龐大的維度與相對少量的樣本數量卻也造成了相關研究的瓶頸,因此如何針對特定問題篩選出代表性的基因集是本研究的主要目的。以往有許多學者針對不同面向提出相關的基因選取方法,然而不管是執行面或是最後所篩選出來的結果,仍舊存在著諸如基因共線性、缺乏組合基因考量或是整體運算複雜度的問題,本研究即針對上述問題來設計基因選取演算法,其中使用以密度為基礎之叢集分析將整體資料進行結構性的分配,並藉此將具有類似表現且個別基因排序值相對較低的基因過濾出來,再利用距離測度計量值 的衡量以及叢集相似度指標的輔助進行組合基因的挑選與替換,而考量基因資料特性,本研究採用關聯度作為基因間相似度的衡量。本研究將所提出的演算法使用在癌症分類的資料檔中,並於最終分類預測的效果表現上得到提升,經過基因替換而提高DM計量值的基因集也能夠得到較高的分類預測正確率,而最終所篩選出來的基因集不管在DM計量值或分類預測正確率的表現上均優於純粹使用個別基因排序法所選取的基因集,顯示DM計量值的確能夠輔助基因篩選,而在本研究考量下所設計的基因選取法也確實能夠得到更具代表性的基因集。
After the Human Genome Project, the next challenge for bio-researchers is to understand the meanings of genes and the inter-relationship between them. As the technique of gene expression microarray stores all the gene expression data in a tiny chip, researchers become able to analyze all expression data of genes simultaneously. Nevertheless, compared to the original statistic data, the huge dimensionality and comparatively few sample amounts of gene expression data are still research obstacles. The objective of this research is to screen a representative set of genes according to a specific problem. Although many gene selection methods have been proposed in recent years, problems, such as gene collinearity, lack of consideration for combination genes, and work complexity, are not thoroughly examined and worked out. The gene selection algorithm of this research is tailored to the problems mentioned above. We first distribute the whole data set of genes using density-based clustering technique and screen out genes that are similar and have comparatively lower individual gene rank values. Then we select and substitute combinative genes according to examination of distance measure value, , and cluster similarity index. Considering the characteristic of gene expression data, we introduce relation-based methods and measure similarity between genes. Coupled with the data of tumor classification, the algorithm proposed in this research is tested and the accuracy rate of classification was improved. The gene set of enhanced can really get a higher accuracy rate of classification. In addition, the accuracy rates of gene sets from our selection algorithm are better than the gene sets from individual gene ranking methods.
摘要 I
Abstract II
誌謝 III
目錄 IV
表目錄 VII
圖目錄 VIII
第一章 緒論 1
1.1研究動機 1
1.2研究目的 2
1.3研究架構與步驟 3
第二章 文獻探討 4
2.1基因微陣列 4
2.1.1 微陣列流程 5
2.1.2 基因微陣列資料型態 5
2.1.3 基因微陣列資料的分析與應用 6
2.2基因選取 6
2.2.1個別基因排序法 7
2.2.2集合基因排序法 8
2.3 叢集分析 8
2.3.1叢集分析步驟 9
2.3.2叢集分析準則 10
2.3.3基因微陣列資料之特性 12
2.4交互認證法則(cross-validation) 13
第三章 研究方法 14
3.1名詞定義 14
3.1.1個別基因與組合基因 14
3.1.2多餘基因 14
3.1.3叢集相似度 15
3.1.4最適組合叢集 16
3.1.5 計量值 16
3.2基因選取架構與描述 18
3.2.1第一階段基因選取 19
3.2.1.1利用叢集分析法DBSCAN找出資料叢集 19
3.2.1.2個別基因排序法 21
3.2.1.3初始基因集選取 24
3.2.2個別基因替換流程 25
3.2.3組合基因替換流程 26
3.2.3.1門檻值設定 30
3.2.3.2停止條件 31
3.3其他基因選取法 31
3.4第二階段分類法 31
3.4.1 K鄰近點分析 31
3.4.2支向機 32
3.5評估流程 32
第四章 實證研究 34
4.1資料收集與整理 34
4.2參數設定 35
4.3實證結果 37
4.3.1程式執行效率 37
4.3.2基因選取過程中 值之變化 40
4.3.3不同個別基因排序法結合分類器之結果比較 46
4.3.3.1 不同個別基因排序法結合相同分類器之結果比較 46
4.3.3.2 相同個別基因排序法結合不同分類器之結果比較 48
4.3.4基因替換次數 50
4.4小結 51
第五章 結論與建議 52
5.1結論 52
5.2建議 54
參考文獻 56
中文 56
英文 57
中文
陳健尉 (2000),基因微陣列之簡介及其應用:二十一世紀基因分析的利器,生物醫學報導,第二期。
陳連進 (2002),以關聯度為基礎的基因表現叢集驗證之方法,國立成功大學資訊工程研究所碩士論文。
周正中 (2005),基因微陣列數據分析簡介,台灣醫學,第9卷第5期,622-627。
張雅芳、黃正仲 (2004),微陣列生物科技,科學發展,第381期,34-41。
鄭凱峰 (2004),小樣本高維度資料中二階段分類法之效能評估-以基因微陣列資料癌症分類為例,國立成功大學工業與資訊管理學系碩士班碩士論文。
許景涵 (2005),以基因微陣列資料探討基因選取方法對分類正確率之影響,國立成功大學工業與資訊管理學系碩士班碩士論文。
程中慧 (2006),無歸納偏置影響因素的基因選取之研究,國立成功大學資訊管理研究所碩士班碩士論文。

英文
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., and Yakhini, Z. (2000). Tissue classification with gene expression profiles, Proceedings of the fourth annual international Conference on Computational molecular biology , 54-64.
Breiman, L. (1996). Bagging predictors, Machine Learning, 24, 123-140.
Daszykowski, M., Walczak, B., and Massart, D. L. (2001). Looking for natural patterns in data part 1. density-based approach, Chemometrics and Intelligent Laboratory Systems, 56(2), 83-92.
Davies, D. I. and Bouldin, D. W. (1979). A cluster seperation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2), 224-227.
DeRisi, J. L., Iyer, V. R., and Brown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 278, 680-686.
Ding, C. and Peng, H. (2003). Minimum redundancy feature selection from microarray gene expression data, Proceedings of the Computational Systems Bioinformatics Conference, 523-529.
Dougherty, E. R. (2001). Small sample issue for microarray-based classification, Comparative and Functional Genomics, 2, 28-34.
Dudoit, S., Fridlyand, J., and Speed, T. (2002). Comparison of discrimination methods for the classification of tumor using gene expression data, Journal of the American Statistical Association, 97, 77-87.
Eisen, M. B., Spellman, P. T., Brown P.O., and Botstein, D. (1998) Cluster analysis and display of genome wide expression patterns, Proceedings of the National Academy of Science of the United States of America, 95, 14863-14868.
Ester, M., Kriegel, H. P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial database with noise. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, 226-231.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531-537.
Guha, S., Rastogi, R., and Shim, K. (1998). CURE: an efficient clustering algorithm for large databases, International Conference on Management of Data, 73-84.
Guha, S., Rastogi, R., and Shim, K. (2000). Rock: a robust clustering for categorical attributes, Proceedings of the 15th International Conference on Data Engineering, 512-521.
Guyon, I., Weston, J., and Barnhill, S. (2002). Gene selection for cancer classification using support vector machines, Machine Learning, 46, 389-422.
Hanczar, B., Courtine, M., Benis, A., Hennegar, C., Clement, K., and Zucker, J. D. (2003). Improving classification of microarray data using prototype-based feature selection, ACM SIGKDD Explorations Newsletter, 5, 23-30.
Huetra, E. B., Duval, B., and Hao, J. K. (2006). A hybrid GA/SVM approach for gene selection and classification of microarray data, Lecture Notes in Computer Science, 3907, 34-44.
Jaeger, J., Sengupta, R., and Rzzo, W. L. (2003). Improved gene selection for classificationof microarrays, Pacific Symposium on Biocomputing, 53-64.
Jain, A. K., Dube, R. C. (1988). Algorithms for clustering data, Englewood Cliffs: Prentice Hall.
Jiang, D., Tang, C., and Zhang, A. (2004). Cluster analysis for gene expression data: a survey, IEEE Transactions on Knowledge and Data Engineering, 16(11), 1370-1386.
Jörnsten, R. and Yu, B. (2003). Simultaneous gene clustering and subset selection for sample classification via MDL, Bioinformatics, 19, 1100-1109.
Karypis, G., Han, E. H., and Kumar, V. (1999). CHAMELEON: a hierarchical clustering algorithm using dynamic modeling, IEEE Computer, 32(8), 68-75.
Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1-2), 273-324.
Lee, K. E., Sha, N., Dougherty, E. R., Vannucci, M., and Mallick, B. K. (2003). Gene selection: a Baysian variable selection approach, Bioinformatics, 19, 90-97.
Li, L., Weinberg, R. C., Darden, T. A., and Pedersen, L. G. (2001). Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method, Bioinformatics, 17, 1131-1142.
Li, J., Zhang, C., and Ogihara, M. (2004). A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression, Bioinformatics, 20(5), 2429-2437.
Liu, H., Li, J., and Wong, L. (2002). A comparative study of feature selection and multiclass classification methods using gene expression profilesand proteomic patterns, Genome Informatics, 13, 51-60.
Liu, B., Wan, C., and Wang, L. (2006). An efficient semi-unsupervised gene selection method via spectral biclustering, IEEE Transactions on Nanobioseience, 5(2), 110-114.
Lu, Y. and Han, J. (2003). Cancer classification using gene expression data, Information Systems, 28, 243-268.
Nguyen, D. V. and Rocke, D. M. (2002). Tumor classification by partial least squares using microarray gene expression data, Bioinformatics, 18, 39-50.
Park, P., Pagano, M., and Bonetti, M. (2001). A nonparametric scoring algorithm for identifying informative genes from microarray data, Proceedings of the Pacific Symposium on Biocomputing, 6, 52-63.
Qian, W. N. and Zhou, A. Y. (2002). Analyzing popular clustering algorithms from different viewpoints, Journal of Software, 13(8), 1382-1394.
Su, Y., Murali, T., Pavlovic, V., Schaffer, M., and Kasif, S. (2003). RankGene: identification of diagnostics genes based on expression data, Bioinformatics, 19, 1578-1579.
Wang, Y., Makedon, F., Ford, J., and Pearlman, J. (2005). Hykgene: a hybrid approach for selecting genes for phenotype classification using microarray gene expression data, Bioinformatics, 21(8), 1530-1537.
Xing, E. P., Jordan, M. I., and Karp, R. M. (2001). Feature selection for high-dimensional genomic microarray data. Proceedings of the Eighteenth International Conference on Machine Learning, 601-608.
Xiong, M., Fang, Z., and Zhao, J. (2003). Biomarker identification by feature wrappers, Genome Research, 11, 1878-1887.
Wong, T. T. and Hsu, C. H. (2006). Two-stage classification methods for microarray data, accepted by Expert Systems with Applications.
Yu, L. and Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5, 1205-1224.
Yu, L. and Liu, H. (2004). Redundancy based feature selection for microarray data. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 737-742.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top