跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/08 20:39
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:黃楹楹
研究生(外文):Ying-Ying Huang
論文名稱:鑑別導致不同功能性基因表現差異之調控因子組合
論文名稱(外文):Genome-wide Co-occurrence Detection of PutativeRegulatory Sites Based on Co-regulated GeneClusters in Yeast Genomes
指導教授:洪炯宗洪炯宗引用關係
指導教授(外文):Jorng-Tzong Horng
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:英文
論文頁數:41
中文關鍵詞:調控因子基因表現
外文關鍵詞:regulatory sitestranscription factor binding sitespattern discoverymining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:175
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0

本文標記轉錄因子, 重複序列和工具預測出的黏合序列定位於基因前的促進區域。應用資料探 (Data Ming) 技術於重複序列與轉錄因子的組合以及工具預測出的黏合序列與轉錄因子的組合。再從關聯規則中去除多餘的規則.利用統計方法找出較有意義的,在規則裡的重複序列和工具預測出的黏合序列中找尋可能的轉錄因子。由於不同的轉錄因子組合的黏合會造成基因的轉錄有所不同,因此我們找出不同功能之相關基因較具鑑別性的組合。我們進行的實驗主要是酵母菌及原蟲的基因組上。轉錄因子的研究上,我們得到相當有價值的資訊,並將結果公開在http://dbms68.csie.ncu.edu.tw/REDB/ 網站上。


The data mining approach, mining association rules, is applied to mine the associations from the combinations of candidate regulatory sites and known regulatory sites. We apply a set of statistical algorithms to characterization of the site combinations in a co-regulated gene group and statistically analyzed it to other co-regulated gene groups to find the site combinations which prefer to occur in a specific gene groups with significant occurrences. The regulatory sites of the gene group-specific site combinations are putative transcription factor binding sites. The methodology introduced here facilitates to analyze combinatorial interactions of multiple transcription factors and is applied to two organisms, Saccharomyces cerevisize and Caenorhabditis elegans, and the promoter regions of ORFs of them. The results are now available at http://dbms68.csie.ncu.edu.tw/REDB/


Contents
Chapter 1 Introduction1
1.1 Motivations1
1.2 Goals2
1.3 Background3
Chapter 2 Related Works5
2.1 Pattern discovery tools5
2.2 TRANSFAC database8
2.3 GenBank10
2.4 RSDB [3]11
2.5 Functional related gene groups11
2.6 Co-regulated gene groups12
Chapter 3 Materials and Methods14
3.2 Preprocessing phase15
3.3 Prediction phase16
3.3.1 Over-represented repeats statistics analysis17
3.3.2 Known site homologs and DNA binding motifs discovery19
3.4 Annotation phase23
3.4.1 Site co-occurrence Analysis24
3.4.2 Significance filtering25
3.4.3 Distance filtering29
Chapter 5 Results30
5.1Positional biased of motif groups30
5.2 Group specific site combinations31
Chapter 6 Summery34
References36
Appendix39
A.Database schema of web39
B.Comparison with other approaches40
C. Enrichment of gene expression clusters for ORFs within MIPS functional categories41

List of Figures
Figure 1. The transcriptional regulation of a gene.4
Figure 2. Top periodic clusters, their motifs and overall distribution in all clusters.13
Figure 3. System Flow15
Figure 4. Map showing the locations of experimentally verified binding sites of Mat□2, Gcn4, Pho4, and Gal4 in upstream regions.22
Figure 5. Example of a mapping between the motif groups.25
Figure 6. A contingency table to show the genes containing sites “aaatat” and “ttgaa”.26

List of Tables
Table 1. The transcription factors and their binding sites in TRANSFAC[2] (Release 5.4).10
Table 2. The partial statistics of repetitive sequences on upstream of gene set of Amino-acid transport functional category.19
Table 3. The number of candidate regulatory sites discovered in the Yeast upstream regions.20
Table 4. The amount of known regulatory sites and over-represented repetitive sequences (OR-repeats) located in the upstream regions of each functional related gene group.30
Table 5. The co-occurrences of known and putative regulatory sites in the site combination (CATCC=>ttt.tt) in the co-regulated gene expression cluster of “cluster 6”..31
Table 6. Group-motifs of gene expression cluster 7, 14, 30 show high similarity measure and high position bias.31
Table 7. Partial significant associations mined in each of the gene expression cluster clusters.32
Table 8. The site combination mined in each functional related gene group. 433
Table 9. Combination significance & Correlation Coefficient of Combination.33
Table 10. Comparison with other approaches40
Table 11. Periodicity index is a quantitative measure of cell-cycle periodicity.41


References
1.Horng, J.T., et al., The repetitive sequence database and mining putative regulatory elements in gene promoter regions. J Comput Biol, 2002. 9(4): p. 621-40.
2.Wingender, E., et al., The TRANSFAC system on gene expression regulation. Nucleic Acids Res, 2001. 29(1): p. 281-3.
3.Horng, J.T., J.H. Lin, and C.Y. Kao. RSDB-A Database of Repetitive Elements in Complete Genomes. in Proceedings of the Atlantic Symposium on Computational Biology and Genome Information Systems & Technology. 2001. Durham, NC, USA.
4.Roth, F.P., et al., Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol, 1998. 16(10): p. 939-45.
5.Bailey, T.L. and C. Elkan, The value of prior knowledge in discovering motifs with MEME. Proc Int Conf Intell Syst Mol Biol, 1995. 3: p. 21-9.
6.Lawrence, C.E., et al., Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 1993. 262(5131): p. 208-14.
7.Sinha, S. and M. Tompa, A statistical method for finding transcription factor binding sites. Proc Int Conf Intell Syst Mol Biol, 2000. 8: p. 344-54.
8.Hughes, J.D., et al., Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol, 2000. 296(5): p. 1205-14.
9.Kel-Margoulis, O.V., et al., COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation. Nucleic Acids Res, 2000. 28(1): p. 311-5.
10.Bjorklund, S. and Y.J. Kim, Mediator of transcriptional regulation. Trends Biochem Sci, 1996. 21(9): p. 335-7.
11.Neuwald, A.F., J.S. Liu, and C.E. Lawrence, Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci, 1995. 4(8): p. 1618-32.
12.Bailey, T.L. and C. Elkan, Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol, 1994. 2: p. 28-36.
13.Liu, X., D.L. Brutlag, and J.S. Liu, BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput, 2001: p. 127-38.
14.Eskin, E. and P.A. Pevzner, Finding composite regulatory patterns in DNA sequences. Bioinformatics, 2002. 18 Suppl 1: p. S354-63.
15.GuhaThakurta, D. and G.D. Stormo, Identifying target sites for cooperatively binding factors. Bioinformatics, 2001. 17(7): p. 608-21.
16.Eskin, E., Sparse Sequence Modeling with Applications to Computational Biology and Intrusion Detection. 2002.
17.Kielbasa, S.M., et al., Combining frequency and positional information to predict transcription factor binding sites. Bioinformatics, 2001. 17(11): p. 1019-26.
18.van Helden, J., del Olmo, M. and Perez-Ortin, J.E., Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals. Nucleic Acids Res., 2000a. 28: p. 1000-1010.
19.van Helden, J., A.F. Rios, and J. Collado-Vides, Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res, 2000. 28(8): p. 1808-18.
20.Mewes, H.W., et al., MIPS: a database for protein sequences, homology data and yeast genome information. Nucleic Acids Res, 1997. 25(1): p. 28-30.
21.Costanzo, M.C., et al., The yeast proteome database (YPD) and Caenorhabditis elegans proteome database (WormPD): comprehensive resources for the organization and comparison of model organism protein information. Nucleic Acids Res, 2000. 28(1): p. 73-6.
22.Tavazoie, S., et al., Systematic determination of genetic network architecture. Nat Genet, 1999. 22(3): p. 281-5.
23.van Helden, J., B. Andre, and J. Collado-Vides, Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol, 1998. 281(5): p. 827-42.
24.Agrawal, R., T. Imielinski, and A. Swami. Mining Associations between Sets of Items in Large Databases. in Proc. of the ACM SIGMOD Int'l Conference on Management of Data. 1993. Washington D.C.
25.Agrawal, R. and R. Srikant, Fast Algorithms for Mining Association Rules. 1994, IBM Almaden Research Center. p. 1-32.
26.Zhu, J. and M.Q. Zhang, SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics, 1999. 15(7-8): p. 607-11.
27.Jensen, L.J. and S. Knudsen, Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation. Bioinformatics, 2000. 16(4): p. 326-33.
28.Sudarsanam, P., Y. Pilpel, and G.M. Church, Genome-wide co-occurrence of promoter elements reveals a cis-regulatory cassette of rRNA transcription motifs in Saccharomyces cerevisiae. Genome Res, 2002. 12(11): p. 1723-31.
29.Matthews, B.W., Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta, 1975. 405(2): p. 442-51.
30.Pilpel, Y., P. Sudarsanam, and G.M. Church, Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet, 2001. 29(2): p. 153-9.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top