臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.208) 您好！臺灣時間：2025/10/02 07:00

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

劉冠良

研究生(外文):

Kuan-Liang Liu

論文名稱:

以叢集分析與距離測度為基礎之基因選取法

論文名稱(外文):

A Subset gene selection method based on clustering analysis and distance measure

指導教授:

翁慈宗

指導教授(外文):

Tzu-Tsung Wong

學位類別:

碩士

校院名稱:

國立成功大學

系所名稱:

資訊管理研究所

學門:

電算機學門

學類:

電算機一般學類

論文種類:

學術論文

論文出版年:

2007

畢業學年度:

語文別:

中文

論文頁數:

中文關鍵詞:

基因選取、基因微陣列、距離測度計量值、癌症分類、叢集分析

外文關鍵詞:

gene selection、distance method、gene microarray、tumor classification、clustering

相關次數:

被引用:6
點閱:361
評分:
下載:50
書目收藏:1

人類基因體計畫完成後，生物學家的下一個課題便是了解數萬個基因所代表的意義與其相互關係。基因微陣列技術能將所有的基因表現資料儲存在一個微小的晶片中，相關研究人員因此能夠同時對所有的基因資料進行分析，然而相較於一般統計資料型態，基因微陣列資料龐大的維度與相對少量的樣本數量卻也造成了相關研究的瓶頸，因此如何針對特定問題篩選出代表性的基因集是本研究的主要目的。以往有許多學者針對不同面向提出相關的基因選取方法，然而不管是執行面或是最後所篩選出來的結果，仍舊存在著諸如基因共線性、缺乏組合基因考量或是整體運算複雜度的問題，本研究即針對上述問題來設計基因選取演算法，其中使用以密度為基礎之叢集分析將整體資料進行結構性的分配，並藉此將具有類似表現且個別基因排序值相對較低的基因過濾出來，再利用距離測度計量值的衡量以及叢集相似度指標的輔助進行組合基因的挑選與替換，而考量基因資料特性，本研究採用關聯度作為基因間相似度的衡量。本研究將所提出的演算法使用在癌症分類的資料檔中，並於最終分類預測的效果表現上得到提升，經過基因替換而提高DM計量值的基因集也能夠得到較高的分類預測正確率，而最終所篩選出來的基因集不管在DM計量值或分類預測正確率的表現上均優於純粹使用個別基因排序法所選取的基因集，顯示DM計量值的確能夠輔助基因篩選，而在本研究考量下所設計的基因選取法也確實能夠得到更具代表性的基因集。

After the Human Genome Project, the next challenge for bio-researchers is to understand the meanings of genes and the inter-relationship between them. As the technique of gene expression microarray stores all the gene expression data in a tiny chip, researchers become able to analyze all expression data of genes simultaneously. Nevertheless, compared to the original statistic data, the huge dimensionality and comparatively few sample amounts of gene expression data are still research obstacles. The objective of this research is to screen a representative set of genes according to a specific problem. Although many gene selection methods have been proposed in recent years, problems, such as gene collinearity, lack of consideration for combination genes, and work complexity, are not thoroughly examined and worked out. The gene selection algorithm of this research is tailored to the problems mentioned above. We first distribute the whole data set of genes using density-based clustering technique and screen out genes that are similar and have comparatively lower individual gene rank values. Then we select and substitute combinative genes according to examination of distance measure value, , and cluster similarity index. Considering the characteristic of gene expression data, we introduce relation-based methods and measure similarity between genes. Coupled with the data of tumor classification, the algorithm proposed in this research is tested and the accuracy rate of classification was improved. The gene set of enhanced can really get a higher accuracy rate of classification. In addition, the accuracy rates of gene sets from our selection algorithm are better than the gene sets from individual gene ranking methods.

摘要 I
Abstract II
誌謝 III
目錄 IV
表目錄 VII
圖目錄 VIII
第一章緒論 1
1.1研究動機 1
1.2研究目的 2
1.3研究架構與步驟 3
第二章文獻探討 4
2.1基因微陣列 4
2.1.1 微陣列流程 5
2.1.2 基因微陣列資料型態 5
2.1.3 基因微陣列資料的分析與應用 6
2.2基因選取 6
2.2.1個別基因排序法 7
2.2.2集合基因排序法 8
2.3 叢集分析 8
2.3.1叢集分析步驟 9
2.3.2叢集分析準則 10
2.3.3基因微陣列資料之特性 12
2.4交互認證法則(cross-validation) 13
第三章研究方法 14
3.1名詞定義 14
3.1.1個別基因與組合基因 14
3.1.2多餘基因 14
3.1.3叢集相似度 15
3.1.4最適組合叢集 16
3.1.5 計量值 16
3.2基因選取架構與描述 18
3.2.1第一階段基因選取 19
3.2.1.1利用叢集分析法DBSCAN找出資料叢集 19
3.2.1.2個別基因排序法 21
3.2.1.3初始基因集選取 24
3.2.2個別基因替換流程 25
3.2.3組合基因替換流程 26
3.2.3.1門檻值設定 30
3.2.3.2停止條件 31
3.3其他基因選取法 31
3.4第二階段分類法 31
3.4.1 K鄰近點分析 31
3.4.2支向機 32
3.5評估流程 32
第四章實證研究 34
4.1資料收集與整理 34
4.2參數設定 35
4.3實證結果 37
4.3.1程式執行效率 37
4.3.2基因選取過程中值之變化 40
4.3.3不同個別基因排序法結合分類器之結果比較 46
4.3.3.1 不同個別基因排序法結合相同分類器之結果比較 46
4.3.3.2 相同個別基因排序法結合不同分類器之結果比較 48
4.3.4基因替換次數 50
4.4小結 51
第五章結論與建議 52
5.1結論 52
5.2建議 54
參考文獻 56
中文 56
英文 57

中文
陳健尉 (2000)，基因微陣列之簡介及其應用：二十一世紀基因分析的利器，生物醫學報導，第二期。
陳連進 (2002)，以關聯度為基礎的基因表現叢集驗證之方法，國立成功大學資訊工程研究所碩士論文。
周正中 (2005)，基因微陣列數據分析簡介，台灣醫學，第9卷第5期，622-627。
張雅芳、黃正仲 (2004)，微陣列生物科技，科學發展，第381期，34-41。
鄭凱峰 (2004)，小樣本高維度資料中二階段分類法之效能評估-以基因微陣列資料癌症分類為例，國立成功大學工業與資訊管理學系碩士班碩士論文。
許景涵 (2005)，以基因微陣列資料探討基因選取方法對分類正確率之影響，國立成功大學工業與資訊管理學系碩士班碩士論文。
程中慧 (2006)，無歸納偏置影響因素的基因選取之研究，國立成功大學資訊管理研究所碩士班碩士論文。

英文
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., and Yakhini, Z. (2000). Tissue classification with gene expression profiles, Proceedings of the fourth annual international Conference on Computational molecular biology , 54-64.
Breiman, L. (1996). Bagging predictors, Machine Learning, 24, 123-140.
Daszykowski, M., Walczak, B., and Massart, D. L. (2001). Looking for natural patterns in data part 1. density-based approach, Chemometrics and Intelligent Laboratory Systems, 56(2), 83-92.
Davies, D. I. and Bouldin, D. W. (1979). A cluster seperation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2), 224-227.
DeRisi, J. L., Iyer, V. R., and Brown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 278, 680-686.
Ding, C. and Peng, H. (2003). Minimum redundancy feature selection from microarray gene expression data, Proceedings of the Computational Systems Bioinformatics Conference, 523-529.
Dougherty, E. R. (2001). Small sample issue for microarray-based classification, Comparative and Functional Genomics, 2, 28-34.
Dudoit, S., Fridlyand, J., and Speed, T. (2002). Comparison of discrimination methods for the classification of tumor using gene expression data, Journal of the American Statistical Association, 97, 77-87.
Eisen, M. B., Spellman, P. T., Brown P.O., and Botstein, D. (1998) Cluster analysis and display of genome wide expression patterns, Proceedings of the National Academy of Science of the United States of America, 95, 14863-14868.
Ester, M., Kriegel, H. P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial database with noise. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, 226-231.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 531-537.
Guha, S., Rastogi, R., and Shim, K. (1998). CURE: an efficient clustering algorithm for large databases, International Conference on Management of Data, 73-84.
Guha, S., Rastogi, R., and Shim, K. (2000). Rock: a robust clustering for categorical attributes, Proceedings of the 15th International Conference on Data Engineering, 512-521.
Guyon, I., Weston, J., and Barnhill, S. (2002). Gene selection for cancer classification using support vector machines, Machine Learning, 46, 389-422.
Hanczar, B., Courtine, M., Benis, A., Hennegar, C., Clement, K., and Zucker, J. D. (2003). Improving classification of microarray data using prototype-based feature selection, ACM SIGKDD Explorations Newsletter, 5, 23-30.
Huetra, E. B., Duval, B., and Hao, J. K. (2006). A hybrid GA/SVM approach for gene selection and classification of microarray data, Lecture Notes in Computer Science, 3907, 34-44.
Jaeger, J., Sengupta, R., and Rzzo, W. L. (2003). Improved gene selection for classificationof microarrays, Pacific Symposium on Biocomputing, 53-64.
Jain, A. K., Dube, R. C. (1988). Algorithms for clustering data, Englewood Cliffs: Prentice Hall.
Jiang, D., Tang, C., and Zhang, A. (2004). Cluster analysis for gene expression data: a survey, IEEE Transactions on Knowledge and Data Engineering, 16(11), 1370-1386.
Jörnsten, R. and Yu, B. (2003). Simultaneous gene clustering and subset selection for sample classification via MDL, Bioinformatics, 19, 1100-1109.
Karypis, G., Han, E. H., and Kumar, V. (1999). CHAMELEON: a hierarchical clustering algorithm using dynamic modeling, IEEE Computer, 32(8), 68-75.
Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1-2), 273-324.
Lee, K. E., Sha, N., Dougherty, E. R., Vannucci, M., and Mallick, B. K. (2003). Gene selection: a Baysian variable selection approach, Bioinformatics, 19, 90-97.
Li, L., Weinberg, R. C., Darden, T. A., and Pedersen, L. G. (2001). Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method, Bioinformatics, 17, 1131-1142.
Li, J., Zhang, C., and Ogihara, M. (2004). A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression, Bioinformatics, 20(5), 2429-2437.
Liu, H., Li, J., and Wong, L. (2002). A comparative study of feature selection and multiclass classification methods using gene expression profilesand proteomic patterns, Genome Informatics, 13, 51-60.
Liu, B., Wan, C., and Wang, L. (2006). An efficient semi-unsupervised gene selection method via spectral biclustering, IEEE Transactions on Nanobioseience, 5(2), 110-114.
Lu, Y. and Han, J. (2003). Cancer classification using gene expression data, Information Systems, 28, 243-268.
Nguyen, D. V. and Rocke, D. M. (2002). Tumor classification by partial least squares using microarray gene expression data, Bioinformatics, 18, 39-50.
Park, P., Pagano, M., and Bonetti, M. (2001). A nonparametric scoring algorithm for identifying informative genes from microarray data, Proceedings of the Pacific Symposium on Biocomputing, 6, 52-63.
Qian, W. N. and Zhou, A. Y. (2002). Analyzing popular clustering algorithms from different viewpoints, Journal of Software, 13(8), 1382-1394.
Su, Y., Murali, T., Pavlovic, V., Schaffer, M., and Kasif, S. (2003). RankGene: identification of diagnostics genes based on expression data, Bioinformatics, 19, 1578-1579.
Wang, Y., Makedon, F., Ford, J., and Pearlman, J. (2005). Hykgene: a hybrid approach for selecting genes for phenotype classification using microarray gene expression data, Bioinformatics, 21(8), 1530-1537.
Xing, E. P., Jordan, M. I., and Karp, R. M. (2001). Feature selection for high-dimensional genomic microarray data. Proceedings of the Eighteenth International Conference on Machine Learning, 601-608.
Xiong, M., Fang, Z., and Zhao, J. (2003). Biomarker identification by feature wrappers, Genome Research, 11, 1878-1887.
Wong, T. T. and Hsu, C. H. (2006). Two-stage classification methods for　microarray data, accepted by Expert Systems with Applications.
Yu, L. and Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5, 1205-1224.
Yu, L. and Liu, H. (2004). Redundancy based feature selection for microarray data. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 737-742.

電子全文

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	以基因微陣列資料探討基因選取方法對分類正確率之影響
2.	小樣本高維度資料中二階段分類法之效能評估-以基因微陣列資料癌症分類為例
3.	無歸納偏置影響因素的基因選取方法之研究
4.	以致病基因集為先驗資訊的基因選取方法之研究
5.	應用可自定群數的非監督式學習法於基因選取
6.	運用集成學習分類於白血病腫瘤基因之研究
7.	考量分類錯誤成本的個別基因排序法
8.	以關聯度為基礎的基因表現叢集驗證之方法
9.	應用致病基因於貝氏模型平均法之基因選取
10.	應用螞蟻演算法於基因篩選—以癌症分類為例
11.	基因表現時間序列的叢集分析方法與系統實作
12.	運用徑向基函數類神經網路在癌症基因選擇之研究
13.	在微陣列資料上利用基因分群以減少冗贅之基因選取方法
14.	利用微陣列晶片資料於癌症樣本分類與功能性類別學習之研究

1.	陳健尉 (2000)，基因微陣列之簡介及其應用：二十一世紀基因分析的利器，生物醫學報導，第二期。
2.	3.林向愷，(1991)，「投資人異質性與股價的決定:台灣的實際分析」，經濟論文叢刊。
3.	8.吳博欽，(1998)年,「年金退休後勞動供給與財富效果-擴充的家庭生命週期模型」，勞資關係論叢。

1.	應用多項式模型與無資訊廣義狄式先驗分配之簡易貝式分類器於基因序列資料分類之研究
2.	瀝青混凝土性質對降溫過程之影響
3.	內部人申報持股轉讓對公司股價之影響
4.	混合通風系統對室內通風效益影響之研究--以雙層屋頂搭配排風扇之教室單元為例
5.	變動比例投資組合保險策略及其在短期和長期資產管理上之應用
6.	以派屈網模式為基礎之批次製程線上失誤診斷方法
7.	龍發堂攝影研究
8.	矽烷表面改質對二氧化矽奈米複合材料塗層性質之影響
9.	釔安定氧化鋯電泳沉積機制及動力學之研究
10.	以電子繞射量測殘留應變與其應用
11.	台灣地區汽車持有需求預測之研究
12.	消費者錯過促銷對其知覺價格不公平、情緒及行為意圖之研究---以購買行動電話為例
13.	應用具彈簧之倒置凸輪機構平衡機構輸入扭矩的最佳設計
14.	以有限元素法預測架高地板構造形式對樓板衝擊音衰減影響之研究
15.	(TixZr80-x)Ni20準晶和非晶的形成與電傳輸特性研究

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室