( 您好!臺灣時間:2021/07/28 22:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


論文名稱:CTGR-Span : 適用於時序性微陣列資料集之多時間點基因調控循序樣式探勘方法
論文名稱(外文):CTGR-Span: Efficient Mining of Cross-Timepoint Gene Regulation Sequential Pattern from Time Course Microarray Datasets
指導教授(外文):Shin-Mu Tseng
外文關鍵詞:Sequential patternsTime-course microarrayGene regulationGene expression mining
  • 被引用被引用:0
  • 點閱點閱:107
  • 評分評分:
  • 下載下載:3
  • 收藏至我的研究室書目清單書目收藏:0
In the past decade, sequential pattern mining methods have been widely used in different topics of interest, such as mining customer purchasing sequences from a transactional database or mining gene regulatory motifs from biological protein sequences. With the rapid development of biotechnologies, observation of gene expressions to discover gene regulations during biological or clinical progression through microarray approaches has become the dominant trend. By converting microarray datasets into the format of transactional databases, sequential patterns implying gene regulations could be identified. However, there exists no effective method in current studies that can handle such kind of dataset since every transaction may contain too many items/genes and the resulting patterns are very susceptible to item order. Therefore, we propose a new algorithm called CTGR-Span (Cross-Timepoint Gene Regulation Sequential Patterns) to efficiently mine cross-timepoint gene regulation sequential patterns (CTGR-SPs) for tackling the problems mentioned above. The proposed method was experimented with two publicly available human time course microarray datasets and it was shown to outperform the traditional methods over 2,000 times in terms of the execution efficiency. Furthermore, via a Gene Ontology enrichment analysis, the resultant patterns are more meaningful biologically compared to previous literature reports. For further evaluating the disassembled 390 regulations from the longest CTGR-SPs, one regulation could be found in previous literature. In contrast, there are no regulations among the original datasets-reported significant genes. Hence, the CTGR-SPs could provide biologists more insights into the mechanisms of novel gene regulations during a biological or clinical progress.
摘要 III
誌謝 IV
List of Tables VII
List of Figures VIII
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Overview of Proposed Method 2
1.3 Contributions 4
1.4 Thesis Structure 5
Chapter 2 Related Work 6
2.1 Microarray Introduction 6
2.2 Sequential Pattern Mining 8
Chapter 3 Proposed Approach 10
3.1 Input Data Description 10
3.1.1 Input Microarray Datasets 10
3.1.2 Normalization of Microarray Datasets 11
3.1.3 Converting Microarray Datasets into Transactional Databases 11
3.2 CTGR-Span: Cross-Timepoint Gene Regulation Sequential Pattern 13
3.2.1 Kernel Procedure 13
3.2.2 Bio-logical Parameter Design 18 minTSupp 18 SWS 19 maxTC 21
Chapter 4 Experimental Results and Discussions 23
4.1 Performance Comparisons 23
4.2 Optimal Parameter Tuning 25
4.3 Evaluation with Literature-related GOs 31
4.4 Literature Evaluation and Visualization 32
Chapter 5 Conclusions and Future Work 36
5.1 Conclusions 36
5.2 Future Work 37
References 38
[1]J. E. McDunn, K. D. Husain, A. D. Polpitiya, A. Burykin, J. Ruan, Q. Li, W. Schierding, N. Lin, D. Dixon, W. Zhang, C. M. Coopersmith, W. M. Dunne, M. Colonna, B. K. Ghosh, and J. P. Cobb, Plasticity of the systemic inflammatory response to acute infection during critical illness: development of the riboleukogram, PLoS ONE, vol. 3, p. e1564, 2008.
[2]M. W. Taylor, T. Tsukahara, J. N. McClintick, H. J. Edenberg, and P. Kwo, Cyclic changes in gene expression induced by Peg-interferon alfa-2b plus ribavirin in peripheral blood monocytes (PBMC) of hepatitis C patients during the first 10 weeks of treatment, J Transl Med, vol. 6, p. 66, 2008.
[3]N. Wei, S. S. Liu, K. K. Chan, and H. Y. Ngan, Tumour suppressive function and modulation of programmed cell death 4 (PDCD4) in ovarian cancer, PLoS ONE, vol. 7, p. e30311, 2012.
[4]M. Kim, H. Shin, T. Su Chung, J. G. Joung, and J. H. Kim, Extracting regulatory modules from gene expression data by sequential pattern mining, BMC Genomics, vol. 12 Suppl 3, p. S5, Nov 30 2011.
[5]Ingenuity Pathways Analysis software web link [http://www.ingenuity.com/].
[6]G. Dennis, Jr., B. T. Sherman, D. A. Hosack, J. Yang, W. Gao, H. C. Lane, and R. A. Lempicki, DAVID: Database for Annotation, Visualization, and Integrated Discovery, Genome Biol, vol. 4, p. P3, 2003.
[7]A. N. Jain, T. A. Tokuyasu, A. M. Snijders, R. Segraves, D. G. Albertson, and D. Pinkel, Fully automatic quantification of microarray image data, Genome Res, vol. 12, pp. 325-32, Feb 2002.
[8]B. M. Bolstad, R. A. Irizarry, M. Astrand, and T. P. Speed, A comparison of normalization methods for high density oligonucleotide array data based on variance and bias, Bioinformatics, vol. 19, pp. 185-93, Jan 22 2003.
[9]C. Workman, L. J. Jensen, H. Jarmer, R. Berka, L. Gautier, H. B. Nielser, H. H. Saxild, C. Nielsen, S. Brunak, and S. Knudsen, A new non-linear normalization method for reducing variability in DNA microarray experiments, Genome Biol, vol. 3, p. research0048, Aug 30 2002.
[10]Y. H. Yang, S. Dudoit, P. Luu, D. M. Lin, V. Peng, J. Ngai, and T. P. Speed, Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation, Nucleic Acids Res, vol. 30, p. e15, Feb 15 2002.
[11]G. K. Smyth and T. Speed, Normalization of cDNA microarray data, Methods, vol. 31, pp. 265-73, Dec 2003.
[12]G. K. Smyth, Linear models and empirical bayes methods for assessing differential expression in microarray experiments, Stat Appl Genet Mol Biol, vol. 3, p. Article3, 2004.
[13]W. Pan, A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments, Bioinformatics, vol. 18, pp. 546-54, Apr 2002.
[14]G. Sherlock, Analysis of large-scale gene expression data, Brief Bioinform, vol. 2, pp. 350-62, Dec 2001.
[15]P. C. Boutros and A. B. Okey, Unsupervised pattern recognition: an introduction to the whys and wherefores of clustering microarray data, Brief Bioinform, vol. 6, pp. 331-43, Dec 2005.
[16]N. Iizuka, M. Oka, H. Yamada-Okabe, N. Mori, T. Tamesa, T. Okada, N. Takemoto, A. Tangoku, K. Hamada, H. Nakayama, T. Miyamoto, S. Uchimura, and Y. Hamamoto, Comparison of gene expression profiles between hepatitis B virus- and hepatitis C virus-infected hepatocellular carcinoma by oligonucleotide microarray data on the basis of a supervised learning method, Cancer Res, vol. 62, pp. 3939-44, Jul 15 2002.
[17]C. Creighton and S. Hanash, Mining gene expression databases for association rules, Bioinformatics, vol. 19, pp. 79-86, Jan 2003.
[18]P. G. Febbo and P. W. Kantoff, Noise and bias in microarray analysis of tumor specimens, J Clin Oncol, vol. 24, pp. 3719-21, Aug 10 2006.
[19]A. Butte, The use and analysis of microarray data, Nat Rev Drug Discov, vol. 1, pp. 951-60, Dec 2002.
[20]R. Agrawal and R. Srikant, Mining sequential patterns, in Data Engineering, 1995. Proceedings of the Eleventh International Conference on, 1995, pp. 3-14.
[21]J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu, FreeSpan: frequent pattern-projected sequential pattern mining, presented at the Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, Massachusetts, United States, 2000.
[22]M.-Y. Lin, S.-C. Hsueh, and C.-W. Chang, Fast discovery of sequential patterns in large databases using effective time-indexing, Information Sciences, vol. 178, pp. 4228-4245, 2008.
[23]J. Pei, J. Han, B. Mortazavi-asl, H. Pinto, Q. Chen, U. Dayal, and M.-c. Hsu, PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, presented at the Proceedings of the 17th International Conference on Data Engineering, 2001.
[24]M. J. Zaki, SPADE: An Efficient Algorithm for Mining Frequent Sequences, Machine Learning, vol. 42, pp. 31-60, 2001.
[25]R. Srikant and R. Agrawal, Mining Sequential Patterns: Generalizations and Performance Improvements, presented at the Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, 1996.
[26]X. Zhang. (2003). A Top-Down Approach for Mining Most Specific Frequent Patterns in Biological Sequence Data.
[27]K. Wang, Y. Xu, and J. X. Yu, Scalable sequential pattern mining for biological sequences, presented at the Proceedings of the thirteenth ACM international conference on Information and knowledge management, Washington, D.C., USA, 2004.
[28]Hong Cheng, Xifeng Yan, and Jiawei Han, IncSpan: incremental mining of sequential patterns in large database, presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA, 2004.
[29]J. Bailey and E. Loekito, Efficient incremental mining of contrast patterns in changing data, Information Processing Letters, vol. 110, pp. 88-92, 2010.
[30]H. Cheng, X. Yan, and J. Han, SeqIndex: Indexing sequences by sequential pattern analysis, in Society for Design and Process Science, ed: SIAM, 1998, pp. 84-93.
[31]H. Pinto, J. Han, J. Pei, K. Wang, Q. Chen, and U. Dayal, Multi-dimensional sequential pattern mining, presented at the Proceedings of the tenth international conference on Information and knowledge management, Atlanta, Georgia, USA, 2001.
[32]H. Kum, J. Pei, W. Wang, and D. Duncan, ApproxMAP: Approximate Mining of Consensus Sequential Patterns, UNC-CH2002.
[33]J. Yang, W. Wang, P. S. Yu, and J. Han, Mining long sequential patterns in a noisy environment, presented at the Proceedings of the 2002 ACM SIGMOD international conference on Management of data, Madison, Wisconsin, 2002.
[34]M. N. Garofalakis, R. Rastogi, and K. Shim, SPIRIT: Sequential Pattern Mining with Regular Expression Constraints, presented at the Proceedings of the 25th International Conference on Very Large Data Bases, 1999.
[35]M. Capelle, C. Masson, and J.-f. Boulicaut, Mining frequent sequential patterns under regular expressions: a highly adaptive strategy for pushing constraints, in In Proceedings SIAM DM 2003, ed: Springer-Verlag, 2003, pp. 316-320.
[36]J. Pei, J. Han, and W. Wang, Mining sequential patterns with constraints in large databases, presented at the Proceedings of the eleventh international conference on Information and knowledge management, McLean, Virginia, USA, 2002.
[37]J. Wang and J. Han, BIDE: efficient mining of frequent closed sequences, in Data Engineering, 2004. Proceedings. 20th International Conference on, 2004, pp. 79-90.
[38]X. Yan, J. Han, and R. Afshar, CloSpan: Mining Closed Sequential Patterns in Large Datasets, in In SDM, 2003, pp. 166-177.
[39]M.-Y. LIN, S.-C. HSUEH, and C.-W. CHANG, Mining Closed Sequential Patterns with Time Constraint, Journal of Information Science and Engineering, vol. 24, pp. 33-46, 2008.
[40]W. Jianyong, H. Jiawei, and L. Chun, Frequent Closed Sequence Mining without Candidate Maintenance, Knowledge and Data Engineering, IEEE Transactions on, vol. 19, pp. 1042-1056, 2007.
[41]X. Yan and J. Han, gSpan: Graph-Based Substructure Pattern Mining, presented at the Proceedings of the 2002 IEEE International Conference on Data Mining, 2002.
[42]U. Yun, A new framework for detecting weighted sequential patterns in large sequence databases, Know.-Based Syst., vol. 21, pp. 110-122, 2008.
[43]M.-Y. Lin and S.-Y. Lee, Efficient mining of sequential patterns with time constraints by delimited pattern growth, Knowl. Inf. Syst., vol. 7, pp. 499-514, 2005.
[44]C.-M. Hsu, C.-Y. Chen, C.-C. Hsu, and B.-J. Liu, Efficient discovery of structural motifs from protein sequences with combination of flexible intra- and inter-block gap constraints, presented at the Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining, Singapore, 2006.
[45]V. Centner and D. L. Massart, Optimization in locally weighted regression, Anal Chem, vol. 70, pp. 4206-11, Oct 1 1998.
[46]R. C. Gentleman, V. J. Carey, D. M. Bates, B. Bolstad, M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, J. Gentry, K. Hornik, T. Hothorn, W. Huber, S. Iacus, R. Irizarry, F. Leisch, C. Li, M. Maechler, A. J. Rossini, G. Sawitzki, C. Smith, G. Smyth, L. Tierney, J. Y. Yang, and J. Zhang, Bioconductor: open software development for computational biology and bioinformatics, Genome Biol, vol. 5, p. R80, 2004.
[47]C. W. Chang, W. C. Cheng, C. R. Chen, W. Y. Shu, M. L. Tsai, C. L. Huang, and I. C. Hsu, Identification of human housekeeping genes and tissue-selective genes by microarray meta-analysis, PLoS ONE, vol. 6, p. e22859, 2011.
[48]M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nat Genet, vol. 25, pp. 25-9, May 2000.

註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top