跳到主要內容

臺灣博碩士論文加值系統

(3.231.230.177) 您好!臺灣時間:2021/07/28 22:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:蔡易霖
研究生(外文):Yi-LinTsai
論文名稱:CTGR-Span : 適用於時序性微陣列資料集之多時間點基因調控循序樣式探勘方法
論文名稱(外文):CTGR-Span: Efficient Mining of Cross-Timepoint Gene Regulation Sequential Pattern from Time Course Microarray Datasets
指導教授:曾新穆曾新穆引用關係
指導教授(外文):Shin-Mu Tseng
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:45
中文關鍵詞:循序樣式多時間點微陣列基因調控關係基因表現探勘
外文關鍵詞:Sequential patternsTime-course microarrayGene regulationGene expression mining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:107
  • 評分評分:
  • 下載下載:3
  • 收藏至我的研究室書目清單書目收藏:0
在過去十年中,循序樣式探勘方法被廣泛地使用在不同的主題,像是從交易型資料庫探勘顧客購買循序樣式或者從生物的蛋白質序列探勘基因調控序列。隨著生物科技的進步,藉由微陣列途徑觀察基因表現讀值,進而發現在生物或者臨床的基因調控關係顯然已成為一種顯著的趨勢。將微陣列資料集轉換成交易型資料庫的格式,使可視為基因調控關係的循序樣式能夠被發現。然而,現在並未存在一種方法能夠解決具有每筆交易擁有太多項目或基因的資料集,且這種資料集找出的序列非常容易受到項目的排序影響。因此,為了解決這個問題,我們提出了一種新的演算法叫做CTGR-Span,可以有效率地探勘出多時間點特性的基因調控關係。我們將此方法應用在兩個公開發布具有多時間點特性的人類微陣列資料集,實驗結果指出我們的方法比傳統的方法在探勘過程速度快了兩千倍以上。除此之外,藉由基因本體功能分析,結果顯示,CTGR-Span演算法找出來的序列比之前的文獻報告在生物上更具有意義。為了進一步的驗證,從長度最長的CTGRS-SPs拆解成390對的基因調控關係中,其中有一條已經被過去文獻所報導。相較之下,在原始資料集報導的重要基因中找不到任何一條過去文獻曾經報導過的基因調控關係。因此,CTGR-SPs可以讓生物學家更深入理解新穎的基因調控關係的機制是如何進行在生物或者臨床的進程中。
In the past decade, sequential pattern mining methods have been widely used in different topics of interest, such as mining customer purchasing sequences from a transactional database or mining gene regulatory motifs from biological protein sequences. With the rapid development of biotechnologies, observation of gene expressions to discover gene regulations during biological or clinical progression through microarray approaches has become the dominant trend. By converting microarray datasets into the format of transactional databases, sequential patterns implying gene regulations could be identified. However, there exists no effective method in current studies that can handle such kind of dataset since every transaction may contain too many items/genes and the resulting patterns are very susceptible to item order. Therefore, we propose a new algorithm called CTGR-Span (Cross-Timepoint Gene Regulation Sequential Patterns) to efficiently mine cross-timepoint gene regulation sequential patterns (CTGR-SPs) for tackling the problems mentioned above. The proposed method was experimented with two publicly available human time course microarray datasets and it was shown to outperform the traditional methods over 2,000 times in terms of the execution efficiency. Furthermore, via a Gene Ontology enrichment analysis, the resultant patterns are more meaningful biologically compared to previous literature reports. For further evaluating the disassembled 390 regulations from the longest CTGR-SPs, one regulation could be found in previous literature. In contrast, there are no regulations among the original datasets-reported significant genes. Hence, the CTGR-SPs could provide biologists more insights into the mechanisms of novel gene regulations during a biological or clinical progress.
ABSTRACT I
摘要 III
誌謝 IV
CONTENTS V
List of Tables VII
List of Figures VIII
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Overview of Proposed Method 2
1.3 Contributions 4
1.4 Thesis Structure 5
Chapter 2 Related Work 6
2.1 Microarray Introduction 6
2.2 Sequential Pattern Mining 8
Chapter 3 Proposed Approach 10
3.1 Input Data Description 10
3.1.1 Input Microarray Datasets 10
3.1.2 Normalization of Microarray Datasets 11
3.1.3 Converting Microarray Datasets into Transactional Databases 11
3.2 CTGR-Span: Cross-Timepoint Gene Regulation Sequential Pattern 13
3.2.1 Kernel Procedure 13
3.2.2 Bio-logical Parameter Design 18
3.2.2.1 minTSupp 18
3.2.2.2 SWS 19
3.2.2.3 maxTC 21
Chapter 4 Experimental Results and Discussions 23
4.1 Performance Comparisons 23
4.2 Optimal Parameter Tuning 25
4.3 Evaluation with Literature-related GOs 31
4.4 Literature Evaluation and Visualization 32
Chapter 5 Conclusions and Future Work 36
5.1 Conclusions 36
5.2 Future Work 37
References 38
VITA 45
References
[1]J. E. McDunn, K. D. Husain, A. D. Polpitiya, A. Burykin, J. Ruan, Q. Li, W. Schierding, N. Lin, D. Dixon, W. Zhang, C. M. Coopersmith, W. M. Dunne, M. Colonna, B. K. Ghosh, and J. P. Cobb, Plasticity of the systemic inflammatory response to acute infection during critical illness: development of the riboleukogram, PLoS ONE, vol. 3, p. e1564, 2008.
[2]M. W. Taylor, T. Tsukahara, J. N. McClintick, H. J. Edenberg, and P. Kwo, Cyclic changes in gene expression induced by Peg-interferon alfa-2b plus ribavirin in peripheral blood monocytes (PBMC) of hepatitis C patients during the first 10 weeks of treatment, J Transl Med, vol. 6, p. 66, 2008.
[3]N. Wei, S. S. Liu, K. K. Chan, and H. Y. Ngan, Tumour suppressive function and modulation of programmed cell death 4 (PDCD4) in ovarian cancer, PLoS ONE, vol. 7, p. e30311, 2012.
[4]M. Kim, H. Shin, T. Su Chung, J. G. Joung, and J. H. Kim, Extracting regulatory modules from gene expression data by sequential pattern mining, BMC Genomics, vol. 12 Suppl 3, p. S5, Nov 30 2011.
[5]Ingenuity Pathways Analysis software web link [http://www.ingenuity.com/].
[6]G. Dennis, Jr., B. T. Sherman, D. A. Hosack, J. Yang, W. Gao, H. C. Lane, and R. A. Lempicki, DAVID: Database for Annotation, Visualization, and Integrated Discovery, Genome Biol, vol. 4, p. P3, 2003.
[7]A. N. Jain, T. A. Tokuyasu, A. M. Snijders, R. Segraves, D. G. Albertson, and D. Pinkel, Fully automatic quantification of microarray image data, Genome Res, vol. 12, pp. 325-32, Feb 2002.
[8]B. M. Bolstad, R. A. Irizarry, M. Astrand, and T. P. Speed, A comparison of normalization methods for high density oligonucleotide array data based on variance and bias, Bioinformatics, vol. 19, pp. 185-93, Jan 22 2003.
[9]C. Workman, L. J. Jensen, H. Jarmer, R. Berka, L. Gautier, H. B. Nielser, H. H. Saxild, C. Nielsen, S. Brunak, and S. Knudsen, A new non-linear normalization method for reducing variability in DNA microarray experiments, Genome Biol, vol. 3, p. research0048, Aug 30 2002.
[10]Y. H. Yang, S. Dudoit, P. Luu, D. M. Lin, V. Peng, J. Ngai, and T. P. Speed, Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation, Nucleic Acids Res, vol. 30, p. e15, Feb 15 2002.
[11]G. K. Smyth and T. Speed, Normalization of cDNA microarray data, Methods, vol. 31, pp. 265-73, Dec 2003.
[12]G. K. Smyth, Linear models and empirical bayes methods for assessing differential expression in microarray experiments, Stat Appl Genet Mol Biol, vol. 3, p. Article3, 2004.
[13]W. Pan, A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments, Bioinformatics, vol. 18, pp. 546-54, Apr 2002.
[14]G. Sherlock, Analysis of large-scale gene expression data, Brief Bioinform, vol. 2, pp. 350-62, Dec 2001.
[15]P. C. Boutros and A. B. Okey, Unsupervised pattern recognition: an introduction to the whys and wherefores of clustering microarray data, Brief Bioinform, vol. 6, pp. 331-43, Dec 2005.
[16]N. Iizuka, M. Oka, H. Yamada-Okabe, N. Mori, T. Tamesa, T. Okada, N. Takemoto, A. Tangoku, K. Hamada, H. Nakayama, T. Miyamoto, S. Uchimura, and Y. Hamamoto, Comparison of gene expression profiles between hepatitis B virus- and hepatitis C virus-infected hepatocellular carcinoma by oligonucleotide microarray data on the basis of a supervised learning method, Cancer Res, vol. 62, pp. 3939-44, Jul 15 2002.
[17]C. Creighton and S. Hanash, Mining gene expression databases for association rules, Bioinformatics, vol. 19, pp. 79-86, Jan 2003.
[18]P. G. Febbo and P. W. Kantoff, Noise and bias in microarray analysis of tumor specimens, J Clin Oncol, vol. 24, pp. 3719-21, Aug 10 2006.
[19]A. Butte, The use and analysis of microarray data, Nat Rev Drug Discov, vol. 1, pp. 951-60, Dec 2002.
[20]R. Agrawal and R. Srikant, Mining sequential patterns, in Data Engineering, 1995. Proceedings of the Eleventh International Conference on, 1995, pp. 3-14.
[21]J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu, FreeSpan: frequent pattern-projected sequential pattern mining, presented at the Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, Massachusetts, United States, 2000.
[22]M.-Y. Lin, S.-C. Hsueh, and C.-W. Chang, Fast discovery of sequential patterns in large databases using effective time-indexing, Information Sciences, vol. 178, pp. 4228-4245, 2008.
[23]J. Pei, J. Han, B. Mortazavi-asl, H. Pinto, Q. Chen, U. Dayal, and M.-c. Hsu, PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, presented at the Proceedings of the 17th International Conference on Data Engineering, 2001.
[24]M. J. Zaki, SPADE: An Efficient Algorithm for Mining Frequent Sequences, Machine Learning, vol. 42, pp. 31-60, 2001.
[25]R. Srikant and R. Agrawal, Mining Sequential Patterns: Generalizations and Performance Improvements, presented at the Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, 1996.
[26]X. Zhang. (2003). A Top-Down Approach for Mining Most Specific Frequent Patterns in Biological Sequence Data.
[27]K. Wang, Y. Xu, and J. X. Yu, Scalable sequential pattern mining for biological sequences, presented at the Proceedings of the thirteenth ACM international conference on Information and knowledge management, Washington, D.C., USA, 2004.
[28]Hong Cheng, Xifeng Yan, and Jiawei Han, IncSpan: incremental mining of sequential patterns in large database, presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA, 2004.
[29]J. Bailey and E. Loekito, Efficient incremental mining of contrast patterns in changing data, Information Processing Letters, vol. 110, pp. 88-92, 2010.
[30]H. Cheng, X. Yan, and J. Han, SeqIndex: Indexing sequences by sequential pattern analysis, in Society for Design and Process Science, ed: SIAM, 1998, pp. 84-93.
[31]H. Pinto, J. Han, J. Pei, K. Wang, Q. Chen, and U. Dayal, Multi-dimensional sequential pattern mining, presented at the Proceedings of the tenth international conference on Information and knowledge management, Atlanta, Georgia, USA, 2001.
[32]H. Kum, J. Pei, W. Wang, and D. Duncan, ApproxMAP: Approximate Mining of Consensus Sequential Patterns, UNC-CH2002.
[33]J. Yang, W. Wang, P. S. Yu, and J. Han, Mining long sequential patterns in a noisy environment, presented at the Proceedings of the 2002 ACM SIGMOD international conference on Management of data, Madison, Wisconsin, 2002.
[34]M. N. Garofalakis, R. Rastogi, and K. Shim, SPIRIT: Sequential Pattern Mining with Regular Expression Constraints, presented at the Proceedings of the 25th International Conference on Very Large Data Bases, 1999.
[35]M. Capelle, C. Masson, and J.-f. Boulicaut, Mining frequent sequential patterns under regular expressions: a highly adaptive strategy for pushing constraints, in In Proceedings SIAM DM 2003, ed: Springer-Verlag, 2003, pp. 316-320.
[36]J. Pei, J. Han, and W. Wang, Mining sequential patterns with constraints in large databases, presented at the Proceedings of the eleventh international conference on Information and knowledge management, McLean, Virginia, USA, 2002.
[37]J. Wang and J. Han, BIDE: efficient mining of frequent closed sequences, in Data Engineering, 2004. Proceedings. 20th International Conference on, 2004, pp. 79-90.
[38]X. Yan, J. Han, and R. Afshar, CloSpan: Mining Closed Sequential Patterns in Large Datasets, in In SDM, 2003, pp. 166-177.
[39]M.-Y. LIN, S.-C. HSUEH, and C.-W. CHANG, Mining Closed Sequential Patterns with Time Constraint, Journal of Information Science and Engineering, vol. 24, pp. 33-46, 2008.
[40]W. Jianyong, H. Jiawei, and L. Chun, Frequent Closed Sequence Mining without Candidate Maintenance, Knowledge and Data Engineering, IEEE Transactions on, vol. 19, pp. 1042-1056, 2007.
[41]X. Yan and J. Han, gSpan: Graph-Based Substructure Pattern Mining, presented at the Proceedings of the 2002 IEEE International Conference on Data Mining, 2002.
[42]U. Yun, A new framework for detecting weighted sequential patterns in large sequence databases, Know.-Based Syst., vol. 21, pp. 110-122, 2008.
[43]M.-Y. Lin and S.-Y. Lee, Efficient mining of sequential patterns with time constraints by delimited pattern growth, Knowl. Inf. Syst., vol. 7, pp. 499-514, 2005.
[44]C.-M. Hsu, C.-Y. Chen, C.-C. Hsu, and B.-J. Liu, Efficient discovery of structural motifs from protein sequences with combination of flexible intra- and inter-block gap constraints, presented at the Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining, Singapore, 2006.
[45]V. Centner and D. L. Massart, Optimization in locally weighted regression, Anal Chem, vol. 70, pp. 4206-11, Oct 1 1998.
[46]R. C. Gentleman, V. J. Carey, D. M. Bates, B. Bolstad, M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, J. Gentry, K. Hornik, T. Hothorn, W. Huber, S. Iacus, R. Irizarry, F. Leisch, C. Li, M. Maechler, A. J. Rossini, G. Sawitzki, C. Smith, G. Smyth, L. Tierney, J. Y. Yang, and J. Zhang, Bioconductor: open software development for computational biology and bioinformatics, Genome Biol, vol. 5, p. R80, 2004.
[47]C. W. Chang, W. C. Cheng, C. R. Chen, W. Y. Shu, M. L. Tsai, C. L. Huang, and I. C. Hsu, Identification of human housekeeping genes and tissue-selective genes by microarray meta-analysis, PLoS ONE, vol. 6, p. e22859, 2011.
[48]M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nat Genet, vol. 25, pp. 25-9, May 2000.



連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top