跳到主要內容

臺灣博碩士論文加值系統

(54.161.24.9) 您好!臺灣時間:2022/01/17 13:08
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳連進
研究生(外文):Lien-Chin Chen
論文名稱:以關聯度為基礎的基因表現叢集驗證之方法
論文名稱(外文):A Correlation-Based Approach for Validating Gene Expression Clustering
指導教授:曾新穆曾新穆引用關係
指導教授(外文):Shin-Mu Tseng
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:64
中文關鍵詞:關聯度基礎相似度基因維陣列基因表現分析叢集驗證叢集
外文關鍵詞:correlation-based similaritymicroarrayclustering validationgene expression analysisclustering
相關次數:
  • 被引用被引用:1
  • 點閱點閱:265
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究是探討各種適用於基因表現分析之關聯度基礎的叢集驗證法。在生物學的分析上,通常都是先使用叢集演算法,依表現變化程度將基因分群,接著再利用叢集驗證方法對叢集結果進行評估。然而,現存的叢集分析所使用的相似度量測方法,大多數是屬於距離基礎類型。但實際上來說,生物學家所希望的是同一叢聚中的基因具有相似的表現趨勢,而非相同的表現值,這即是我們使用關聯度基礎叢集與驗證指標的研究動機。
在本論文中,我們提出一套自動化叢集分析驗證系統,利用此系統可以導引使用者在進行叢集分析的過程中選擇合適的驗證指標。我們發展了一套容積雲狀叢集資料產生器來模擬各種型態的資料,此外也對數種關聯度基礎的驗證指標進行其對叢集結果驗證品質的測試。因此,本系統可以針對使用者所提供之不同類型的資料集,有效地建議其最佳的驗證指標方法。
This research explores various correlation-based clustering validation methods that are suitable for the gene expression analysis. In biological analysis, the clustering algorithms are often used first to partition the genes into groups exhibiting similar patterns of variation in expression level, then the clustering validation methods are applied to evaluate the validity of the clustering results. However, most of similarity measurements used in existing clustering analysis belong to the distance-based category. In fact, a biologist aims to cluster together genes that have similar expression tendency instead of same expression values. This motivates the use of correlation-based clustering and validation indices in this study.
In this thesis, an automatic clustering validation system was presented to guide the user to choose the suitable validation index in cluster analysis. We developed a volumetric-clouds type clusters generator to synthesize various datasets, and a number of correlation-based validation indices were evaluated for measuring the quality of clustering results. Hence, the system can suggest the best validation index for different types of datasets given by users effectively.
英文摘要...I
中文摘要...III
誌謝...IV
目錄...V
表目錄...VIII
圖目錄...IX
第一章 導論...1
1.1基因表現探勘簡介...1
1.2 研究動機...2
1.3 研究目的...3
1.4 本論文內容與架構...4
第二章 相關研究工作...5
2.1相似度量測法...5
2.1.1距離量測...5
2.1.2相關係數...6
2.1.3關聯係數...7
2.1.4機率相似係數...7
2.2叢集分群法...7
2.2.1叢集相似度搜尋技術(CAST)...8
2.3叢集正確性評估法...9
2.3.1內部標準...10
2.3.1.1標準化 Γ值統計...11
2.3.2相對標準...11
2.3.2.1 Dunn 指數...12
2.3.2.2廣義Dunn 指數...12
2.3.2.3 Davies-Bouldin(DB)指數...18
第三章 關聯度基礎的叢集驗證方法...19
3.1自動化叢集驗證系統...19
3.1.1訓練模組...20
3.1.2使用者模組(User Model), 分析模組(Analysis Model)與輸出模組(Output Model)...23
3.2關聯度基礎驗證指數之設計...23
3.2.1修改型Dunn指數...24
3.2.2關聯度基礎的DB指數...26
第四章 資料集設計...27
4.1容積雲狀叢集資料產生器...27
4.2人工合成資料集...33
4.3真實資料集...37
第五章 實驗結果與分析...38
5.1 Synthetic Dataset 實驗...38
5.2 Real Dataset 實驗...45
5.3低相似度類型資料: 資料集1...48
5.3.1標準化 Γ值統計分析...48
5.3.2 DB指數分析...49
5.3.3廣義Dunn指數分析...49
5.4高相似度類型資料:資料集2...55
第六章 結論與未來研究方向...59
6.1 結論...60
6.2未來研究方向...60
參考文獻...61

表目錄
表1 24種synthetic dataset的叢集結構...35
表2 各種驗證方法對Synthetic dataset 的最佳分群評分表...39
表3 各類型資料適合的驗證方法...43
表4 Synthetic dataset平均分數...44
表5 Real dataset 的實驗結果...45
表6 資料集1 (low similarity dataset)分析結果...46
表7 資料集2 (high similarity dataset) 分析結果...47

圖目錄
圖1 correlation-based與distance-based相似度測量之比較...3
圖2 二維叢集分佈示意圖...10
圖3 叢間指標4示意圖...14
圖4 叢間指標5示意圖...14
圖5 叢間指標6示意圖...16
圖6 叢內指標1示意圖...17
圖7 叢內指標2示意圖...17
圖8 叢內指標3示意圖...18
圖9 自動化叢集驗證系統...21
圖10 容積雲狀叢集示意圖...28
圖11 生成項目(generated item)維度選取範圍示意圖...31
圖12 Correlation-based 叢集資料產生器流程圖...32
圖13 Seed item 各維度變化範圍示意圖...33
圖14 由種子項目產生叢集示意圖...34
圖15 Synthetic Dataset設計...36
圖16 Dataset1 Normalized Γ值統計分析曲線圖...48
圖17 Dataset1 DB index曲線圖...49
圖18 Dataset1 叢集個數成長圖...50
圖19 Dataset1 叢集個數成長個數差異圖...50
圖20 Dataset1 6種叢間指標(inter-cluster index)趨勢圖...51
圖21 Dataset1 3種叢內指標(intra-cluster index)趨勢圖...52
圖22 Dataset1 D21與 D31之曲線比較圖...52
圖23 Dataset1 D22與 D32之曲線比較圖...53
圖24 Dataset1 D23與 D33之曲線比較圖...53
圖25 Dataset1 D11之叢內指標與D11曲線比較圖...54
圖26 Dataset1 D21之叢間指標與D21曲線比較圖...55
圖27 Dataset2 叢集個數成長圖...56
圖28 Dataset2 叢集個數成長個數差異圖...57
圖29 Dataset2 叢集個數成長圖...57
圖30 Dataset1 3種叢內指標(intra-cluster index)趨勢圖...58
[1]Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.” Proc. Of the ACM SIGMOD Int’l Conference on Management of Data, Seattle, Washington, June 1998.
[2]Mark S. Aldenderfer, Roger K. Blashfield, “Cluster Analysis.” Sage Publications, Inc., 1984.
[3]M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander, “OPTICS: ordering points to identify the clustering structure.” Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadephia, Pennsylvania, USA, pages 49—60, June 1999.
[4]Amir Ben-Dor, Zohar Yakhini, “Clustering gene expression patterns.” Proceedings of the 3rd Annual International Conference on Computational Molecular BiologyRECOMB ‘99, 1999.
[5]James C. Bezdek, Nikhil R. Pal. “Cluster Validation with Generalized Dunn’s Indices”. Proceedings of the 2nd New Zealand Two-Stream International Conference on Artificial Neural Networks and Expert Systems (ANNES), 1995.
[6]James C. Bezdek,Nikhil R. Pal. ”Some New Index of Cluster Validity”. IEEE TRANSACTION ON SYSTEMS, MAN, AND CYBERNETICS. PART B: CYBERNETICS, Vol.28, NO.3, June 1998.
[7]P. Cheeseman and J. Stutz, “Bayesian classification (AutoClass): Theory and results.” D. Touretzky, M. Mozer, and M. Hasselmo, editors, Advances in Knowledge Discovery and Data Mining, pages 153—180, Cambridge, MA: AAAI/MIT Press, 1996.
[8]Ming-Syan Chen, Jiawei Han, and Philip S. Yu, “Data mining: An Overview from a Database Perspective.” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No.6, December 1996.
[9]Hugues Roest Crollius, Olivier Jaillon, Alain Bernot, Corinne Dasilva, Laurence Bouneau, Cecile Fischer, Cecile Fizames, Patrick Wincker, Philippe Brottier, Francis Quetier, William Saurin and Jean Weissenbach, “Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence.” Nature Genetics 25, 235-238, Jun 2000.
[10]D.L. Davies and D.W. Bouldin. ”A cluster separation measure.” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.1, No2. ,1979.
[11]J. DeRisi, V. R. Iyer and P. O. Brown, “Exploring the metabolic and genetic control of gene expression on a genomic scale.” Science 278, 680-686, 1997.
[12]J. DeRisi, L. Penland, P. O. Brown, M. L. Bittner, P. S. Meltzer, M. Ray, Y. Chen, Y. A. Su and J. M. Trent, “Use of a cDNA microarray to analyze gene expression patterns in human cancer.” Nature Genetics 14: 457-460, 1996.
[13]J. C. Dunn,”Well separated clusters and optimal fuzzy partitions ”, J. Cybern. Vol.4,pp.95-104, 1974.
[14]M. B. Eisen, P. T. Spellman, P.O. BrownD. Botstein. "Cluster analysis 'and display of genome-wide expression patterns. " Proc. Natl Acad Sci U S A 95(25): 14863-8, 1998.
[15]Martin Ester, Hans-Peter Kriegel, Jorg Sander and Xiaowei Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise.” Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pages 226-231, Portland, Orgon, 1996.
[16]Brent Ewing and Phil Green, “Analysis of expressed sequence tags indicates 35,000 human genes.” Nature Genetics 25, 232-234, 2000.
[17]Doug Fisher, “Improving Inference through Conceptual Clustering.” Proceedings of 1987 AAAI Conferences, pages 461-465, Seattle, WA, July 1987.
[18]S. P. A. Fodor, J. L. Read, M. C. Pirrung, L Stryer, A. T. Lu and D. Solas, “Light-directed, spatially addressable parallel chemical synthesis.” Science 251, 767-773,1991.
[19]S. P. A. Fodor, R. P. Rava, X. C. Huang, A. C. Pease, C. P. Holmes, C. L. Adams, “Multiplexed biochemical assays with biological chips.” Nature 364, 555-556, 1993.
[20]S. P. A. Fodor, “Massively parallel genomics.” Science 277, 393–395, 1997.
[21]John H. Gennari, Pat Langley, and Doug Fisher, “Models of incremental concept formation.” Artificial Intelligence, Vol. 40, pages 11-61, 1989.
[22]Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim, “CURE: An efficient clustering algorithm for large databases.” Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 73-84, New York, 1998.
[23]Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim, “ROCK: a robust clustering algorithm for categorical attributes.” Proceedings of the 15th International Conference on Data Eng., 1999.
[24]M. Halkidi,Y. Batistakis,M. Vazirgiannis. ”On Cluster Validation Technigues”.Intelligent Information System Journal, 2001.
[25]Jiawei Han and Micheline Kamber, “Data Mining: Concepts and Techniques.” Morgan Kaufmann, 2000.
[26]A. Hinneburg and D. A. Keim, “An Efficient Approach to Clustering in Multimedia Databases with Noise.” Proc. 4th Int. Conf. On Knowledge Discovery and Data Mining, New York,AAAI Press, 1998.
[27]Anil K. Jain and Richard C. Dubes, “Algorithms for Clustering Data.” Prentice Hall, 1988.
[28]Ching-Pin Kao, Shin-Mu Tseng.” Efficient Clustering Methods for Gene Expression Mining:A performance Evaluation.” Sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2002.
[29]G. Karypis, E. H. Han, and V. Kumar, “CHAMELEON: A hierarchical clustering algorithm using dynamic modeling,” Technical Report TR-99-120, Department of Computer Science, University of Minnesota, Minneapolis, 1999.
[30]L. Kaufman and P. J. Rousseeuw, “Finding groups in data: an Introduction to cluster analysis.” John Wiley & Sons, 1990.
[31]Teuvo Kohonen, “The self-organizing map.” Proceedings of the IEEE, Vol. 78, No 9, pages 1464—1480, September 1990.
[32]Harley H. McAdams and Lucy Shapiro, “Circuit Simulation of Genetic Networks”. Science 269, 650-656, 1995.
[33]J. B. McQueen, “Some Methods of Classification and Analysis of Multivariate Observations.” Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pages 281-297, 1967.
[34]Schena M, Shalon D, Davis RW and Brown P.O., “Quantitative monitoring of gene expression patterns with a complementary DNA microarray.” Science 270: 467-470, 1995.
[35]Raymond T. Ng and Jiawei Han, “Efficient and effective clustering methods for spatial data mining.” Proceedings of the 20th VLDB Conference, pages 144-155, Santiago, Chile, 1994.
[36]M. Schena, D. Shalon, R. W. Davis and P. O. Brown, “Quantitative monitoring of gene expression patterns with a complementary DNA microarray.” Science 270: 467-470, 1995.
[37]Gholamhosein Sheikholeslami, Surojit Chatterjee, and Aidong Zhang, “WaveCluster: A multi-resolution clustering approach for very large spatial databases.” Proceedings of the 24 th Very Large Databases Conference (VLDB 98), pages 428—439, New York, Aug. 1998.
[38]S. Theodoridis, K. Koutroubas,” Pattern recognition”. Academic Press,1999.
[39]E. M. Voorhees,“Implementing agglomerative hierarchical clustering algorithms for use in document retrieval,” Information Processing & Management, 22:465-476, 1986.
[40]Wei Wang, Jiong Yang, and Richard Muntz, “STING: a statistical information grid approach to spatial data mining.” Proc. 23rd Int. Conf. On Very Large Data Bases (VLDB), 186-195, 1997.
[41]X. Wen, S. Fuhrman, G. S. Michaels, D. B. Carr, S. Smith, J. L. Barker, and R. Somogyi, “Large-scale temporal gene expression mapping of central nervous system development.” Proc. Of the Nat. Acadamy of Sciences of the USA, 95(1):334—339, 1998.
[42]Tian Zhang, Raghu Ramakrishnan, and Miron Livny, “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 103—114, Montreal, Canada, 1996.
[43]Tian Zhang, Raghu Ramakrishnan, and Miron Livny, “BIRCH: A new data clustering algorithm and its applications.” Data Mining and Knowledge Discovery, 1(2):141—182, 1997.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊