跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.87) 您好!臺灣時間:2025/01/14 05:16
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:王美惠
研究生(外文):Mei-Huei Wang
論文名稱:以序列型樣探勘技術分析轉錄因子結合區
論文名稱(外文):Analysis of Transcription Factor Binding Sites by Using Sequential Pattern Mining
指導教授:陳春賢陳春賢引用關係
指導教授(外文):Chun-Hsien Chen
學位類別:碩士
校院名稱:長庚大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:中文
論文頁數:115
中文關鍵詞:轉錄因子結合區序列型樣探勘
外文關鍵詞:Transcription Factor Binding SiteSequential Pattern Mining
相關次數:
  • 被引用被引用:1
  • 點閱點閱:565
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
基因的轉錄作用是基因功能表現的重要過程之一,其控制著基因由DNA序列產生RNA的過程。在此過程之中,會有數個轉錄因子參與基因的表現,這些轉錄因子則是以特定的順序與基因上游序列中的特定序列結合而影響轉錄作用,故這些特定序列稱為轉錄因子結合區。因此可藉由探討轉錄因子結合區的排列組合,以了解基因的轉錄調控。
許多研究藉由關聯規則探勘方法探討轉錄因子結合區組合的關聯性,從分析轉錄因子結合區的組合,可提供給生物學家做實驗驗證。但是,使用關聯規則方法分析基因的轉錄因子結合區組合,無從了解這些轉錄因子結合區的排列順序,同時亦會產生大量的規則。
本研究將利用已知的轉錄因子結合區資料庫,針對酵母菌與陰道滴蟲的基因序列,標記已知的轉錄因子結合區序列在基因上游序列的位置,並採用序列型樣探勘中的PrefixSpan演算法,探討特定排列順序的轉錄因子結合區組合,討論其組合對基因調控的影響,並藉此分析與比較序列型樣探勘與關聯規則探勘的差異。從結果來看,可發現使用序列型樣探勘可比關連規則探勘有效率的找出更具影響性之轉錄因子結合區的排列組合及位置,以減少生物學家實驗驗證的時間。
The process of transcription is that an RNA product is produced from a given DNA. In this process, transcription factors affect the expression of genes by binding to specific regions with consensus patterns in the upstream region of genes. Therefore, the consensus patterns are also known as transcription factor binding sites (TFBS). By analyzing how transcription factors act on DNA binding sites and how they collaborate in ordered coordination, we could get an insight of the gene regulation process.
Many computational studies on the combinations of and the relationships between transcription factor binding sites are based on association rule mining. Results of the studies are provided to biologists for further research. However, the sequenceing order of the transcription factor binding sites can not be mined by association rule minig and the number of rules produced by association rule mining is enormous.
This study uses the known TFBS in TRANSFAC database to mark the TFBS positions in upstream sequence of gene. Sequential pattern mining technique is proposed to analyze the permatation of transcription factor binding sites in upstream region of genes. The differences between sequential pattern mining and association rule mining are explored. The result shows that sequential pattern mining find the combination and permutation transcription factor binding sites more efficiently and thus save the time a biologist must otherwise spend on validating the experiment.
目 錄
指導教授推薦書 i
口試委員會審定書 ii
授權書 iii
致謝 iv
摘要 v
Abstract vi
目 錄 vii
表 目 錄 x
圖 目 錄 xii
第1章 緒論 1
1.1 研究背景 2
1.2 研究動機 3
1.3 研究目的 7
1.4 研究假設與限制 8
1.5 論文架構 10
第2章 相關知識與文獻探討 11
2.1 微生物介紹─酵母菌與陰道滴蟲 11
2.1.1 酵母菌介紹 11
2.1.2 陰道滴蟲介紹 13
2.2 基因體學基礎介紹 16
2.2.1 基因功能表現 16
2.2.2 基因的轉錄起始階段 19
2.3 相關生醫工具與資料庫介紹 23
2.3.1 EST 23
2.3.2 BLAST 23
2.3.3 InterPro 28
2.3.4 TIGR 30
2.3.5 TRANSFAC 31
2.3.6 基因功能註解分類架構 35
2.4 序列型樣探勘(Sequential Pattern Mining) 38
2.4.1 序列型樣探勘與關聯規則探勘的不同 40
2.4.2 序列型樣問題定義 42
2.4.3 序列型樣探勘演算法 45
2.4.4 最大頻繁序列的探討 49
2.5 與轉錄因子結合區相關的研究與方法 51
第3章 研究方法 56
3.1 實驗資料與處理 56
3.1.1 酵母菌基因資料 56
3.1.2 陰道滴蟲基因資料 57
3.1.3 轉錄因子結合區序列資料 58
3.2 研究流程 60
3.2.1 取得酵母菌的基因功能註解與基因上游序列 61
3.2.2 取得陰道滴蟲的基因功能與基因功能註解 62
3.2.3 擷取陰道滴蟲的基因上游序列 64
3.2.4 轉錄因子結合區位置探討 69
3.2.5 轉錄因子結合區之序列型樣探勘 70
第4章 結果與討論 73
4.1 序列型樣探勘與關聯規則探勘之比較 73
4.1.1 根據酵母菌資料的方法比較 74
4.1.2 根據陰道滴蟲資料的方法比較 78
4.2 序列探勘的結果分析─以陰道滴蟲為例 82
4.3 討論 89
第5章 結論與未來方向 92
5.1 結論 92
5.2 未來方向 93
參考文獻 95
表 目 錄
表 2 1 世界衛生組織統計在西元1999年,15~49歲男性與女性感染性傳染疾病之人數 14
表 2 2 BLAST種類表 26
表 2 3 BLAST的蛋白質資料庫(2006/06) 26
表 2 4 BLAST的核苷酸資料庫(2006.06) 27
表 2 5 TRANSFAC資料庫(7.0版)各項表格的資料筆數 32
表 2 6 IUPAC編碼 33
表 2 7 序列資料庫 43
表 2 8 交易資料庫範例 46
表 2 9 步驟1所找出的序列型樣 47
表 2 10 步驟2所分割出的投影資料庫 47
表 2 11 根據交易資料表而產生的投影資料庫與序列型樣 48
表 2 12 根據交易資料表產生的序列型樣 51
表 3 1 統計各種生物結合區資料的筆數 58
表 3 2 舉例說明陰道滴蟲基因的功能推論與GO基因功能註解 64
表 3 3舉例說明序列交易資料表 71
表 4 1 酵母菌實驗的資料介紹 75
表 4 2 根據GO mannosyltransferase activity的詞彙所取得的陰道滴蟲基因群 78
表 4 3 陰道滴蟲實驗之參數設定 79
表 4 4 基因上游序列中含有轉錄因子結合區序列的個數 80
表 4 5 陰道滴蟲基因群產生的轉錄因子結合區組合數統計 83
表 4 6 經過序列型樣探勘後,列出在基因群中頻繁出現的轉錄因子結合區序列 84
表 4 7 結合區序列ATTAT在每個基因上游序列中出現的次數與位置 85
表 4 8 經過序列型樣探勘後產生的結果,表中只列出支持度為100%的轉錄因子結合區組合 86
表 4 9 TATTT(YY1)–TAAAAT(POU1F1a)–TCAAT(unknown)組合在每個基因上游序列的詳細位置 88
圖 目 錄
圖 2 1 酵母菌的參考圖片 12
圖 2 2 陰道滴蟲型態示意圖 15
圖 2 3 中心法則 17
圖 2 4 轉錄作用 18
圖 2 5 轉譯作用 19
圖 2 6 TFIID複合物結合在DNA啟動子區域上 21
圖 2 7 RNA聚合酶II與GTFs在啟動子區域形成一個複合物,活化DNA的轉錄作用,使DNA轉錄作用開始進行 22
圖 2 8 不同的結合區有相同的特定序列 35
圖 2 9 R08440為R02248的子序列 35
圖 3 1 轉錄因子結合區序列長度統計 59
圖 3 2 資料流程圖 61
圖 3 3 取得酵母菌的基因上游序列 62
圖 3 4 擷取基因上游序列的流程圖 65
圖 3 5 兩相鄰的基因皆位在Watson股,則根據基因的位置而擷取其基因上游序列 66
圖 3 6 兩相鄰的基因皆為屬於Crick股,則根據基因的位置而擷取其基因上游序列 67
圖 3 7相鄰兩基因位在DNA的不同股,Crick股的基因在Watson股基因的右邊 68
圖 3 8 相鄰兩基因位在DNA的不同股,且Crick股的基因在Watson股基因的左邊 69
圖 3 9 標記轉錄因子結合區在基因上游序列上出現的位置 69
圖 4 1 酵母菌YGR240C的基因所產生的交易序列 76
圖 4 2 從酵母菌資料分析Apriori與PrefixSpan兩演算法所產生的規則數 77
圖 4 3 從陰道滴蟲資料分析Apriori與PrefixSpan兩演算法所產生的規則數 81
英文部分
1.Agrawal, R., Imielinski, T. and Swami, A. N., “Mining association rules between sets of items in large databases,” Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207-216, Washington, D.C., 1993.
2.Agrawal, R. and Srikant, R., “Mining Sequential Patterns,” Proceedings of the International Conference on Data Engineering (ICDE), pp. 3-14, Taipei, Taiwan, March, 1995.
3.Alkema, W. B. L., Johansson, Ö, Lagergren, J. and Wasserman, W. W., “MSCAN: Identification of Functional Clusters of Transcription Factor Binding Sites,” Nucleic Acids Research, 2, pp. W195–W198, 2004.
4.Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J., “Basic Local Alignment Search Tool,” Journal of Molecular Biology, 215, pp. 403-410, 1990.
5.Apweiler, R., et al., “InterPro – an Integrated Documentation Resource for Protein Families, Domains and Functional Sites,” Bioinformatics, 16, pp. 1145-1150, 2000.
6.Arroyo, R., Gonzalez-Robles, A., Martinez-Palomo, A. and Alderete, J. F., “Signalling of Trichomonas vaginalis for Amoeboid Transformation and Adhesion Synthesis Follows Cytoadherence,” Molecular Microbiology, 7, pp. 299-309, 1993.
7.Athel, C. B., “Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences: Recommendations 1984,” Nucleic Acids Research, 13, 9, pp. 3021-3030, 1985.
8.Bailey, T. L. and Elkan, C., “Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers,” Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994.
9.Bailey, T. L. and Elkan, C., “The Value of Prior Knowledge in Discovering Motifs with MEME,” Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology, pp. 21-29, AAAI Press, Menlo Park, California, 1995.
10.Brazma, A., Vilo, J., Ukkonen, E. and Valtonen, K., “Data Mining for Regulatory Elements in Yeast Genome,” Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology, pp. 65-74, AAAI Press, Menlo Park, California, June 1997.
11.Btazewicz, J., Formanowicz, P., Guinand, F. and Kasprzak, M., “A Heuristic Managing Errors for DNA Sequencing,” Bioinformatics, 18, pp. 652-660, 2002.
12.Burdick, D., Calimlim, M. and Gehrke, J., “MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases,” Proceedings of the 17th International Conference on Data Engineering, pp. 443-452, Heidelberg, Germany, April 2001.
13.Compbell, N. A. and Reece, J. B., Biology, 6th ed., Benjamin Cummings, San Francisco, 2002
14.Cudmore, S. L., Delgaty, K. L., Hayward-McClelland, S. F., Petrin, D. P. and Garber, G. E., “Treatment of Infections Caused by Metronidazole-resistant Trichomonas vaginalis,” Clinical Microbiology Reviews, 17, pp. 783-93, 2004.
15.Eisenberg, D., Marcotte, E. M., Xenarios, I. and Yeates, T. O., “Protein Function in the Post-genomic Era,” Nature, 405, pp. 823-826, 2000.
16.Fleischmann, R. D., et al., “Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae Rd,” Science, 269, pp. 496-512, 1995.
17.Forsburg, S. L., “The Art and Design of Genetics Screens: Yeast,” Nature Reviews Genetics, 2, pp. 659-668, 2001.
18.Goffeau, A., et al., “Life with 6000 Genes,” Science, 274, pp. 563-567, 1996.
19.Harris, M. A., et al., “The Gene Ontology (GO) Database and Informatics Resource,” Nucleic Acids Research, 32, pp. D258–D261, January 2004.
20.Heine, P. and McGregor, J. A., “Trichomonas vaginalis: a Reemerging Pathogen,” Clin Obstet Gynecol, 36, pp. 137-44, 1993.
21.Hertz, G. Z. and Stormo, G. D., “Identifying DNA and Protein Patterns with Statistically Significant Alignments of Multiple Sequences,” Bioinformatics, 15, pp. 563-577, 1999.
22.Hieter, P. and Boguski, M., “Functional Genomics: It’s All How You Read It,” Science, 278, pp. 601-602, 1997.
23.Honigberg, B. and King, V. M., “Structure of Trichomonas vaginalis Donne,” Journal of Parasitology, 50, pp. 345-364, 1964.
24.Horng, J. T., Huang, H. D., Huang, S. L., Yang, U. C. and Chang, Y. C., “Mining Putative Regulatory Elements in Promoter Regions of Saccharomyces Cerevisiae,” In Silico Biology, 2, pp. 263-273, 2002.
25.Huang, H. D., Horng, J. T., Sun, Y. M., Tsou, A. P. and Huang, S. L., “Identifying Transcriptional Regulatory Sites in the Human Genome Using an Integrated System,” Nucleic Acids Research, 32, 6, pp. 1948-1956, 2004.
26.Hughes, J. D., Estep, P. W., Tavazoie, S. and Church, G. M., “Computational Identification of Cis-regulatory Elements Associated with Groups of Functionally Related Genes in Saccharomyces cerevisiae,” Journal of Molecular Biology, 296, pp. 1205-1214, 2000.
27.Johansson, Ö, Alkema, W., Wasserman, W. W. and Lagergren, J., “Identification of Functional Clusters of Transcription Factor Binding Motifs in Genome Sequences: the MSCAN Algorithm,” Bioinformatics, 19, pp. i169–i176, 2003.
28.Krieger, J. N., Verdon, M., Siegel, N. and Holmes, K. K., “Natural History of Urogenital Trichomoniasis in Men,” J Urol, 149, pp. 1455-1458, 1993.
29.Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwland, A. F. and Wootton, J. C., “Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment,” Science, 226, pp. 208–214, 1993.
30.Mewes, H. W., Albermann, K., Heumann, K., Liebl, S., Pfeiffer, F., “MIPS: Database for Protein Sequences, Homology Data and Yeast Genome Information,” Nucleic Acids Research, 25, pp. 28-30, 1997.
31.Novina, C. D. and Roy, A. L., “Core Promoters and Transcriptional Control,” Trends in Genetics, 12, pp. 351-355, 1996.
32.Pedersena, A. G., Baldib, P., Chauvinb, Y. and Brunak, S., “The Biology of Eukaryotic Promoter Prediction - a Review,” Computers & Chemistry, 23, pp. 191-207, 1999.
33.Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U. and Hsu, M. C., “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,” Proceedings of the 12th IEEE International Conference on Data Engineering, Heidelberg, Germany, April 2001.
34.Petrin, D., Delgaty, K., Bhatt, R. and Garber, G., “Clinical and Microbiological Aspects of Trichomonas vaginalis,” Clinical Microbiology Reviews, 11, 2, pp. 300-317, 1998.
35.Ruepp, A., Zollner, A., Maier, D., Albermann, K., et al., “The FunCat, a Functional Annotation Scheme for Systematic Classification of Proteins from Whole Genomes,” Nucleic Acids Research, 32, 18, pp. 5539-5545, 2004.
36.Schwebke, J. R. and Burgess, D., “Trichomoniasis,” Clinical Microbiology Reviews 17, pp. 794-803, 2004.
37.van Helden, J., André, B. and Collado-Vides, J., “A Web Site for the Computational Analysis of Yeast Regulatory Sequences.” Yeast, 16, 2, pp. 177-187, 2000.
38.van Helden, J., “Regulatory Sequence Analysis Tools,” Nucleic Acids Research, 31, 13, pp. 3593-3596, 2003.
39.Venter, J. C. et al., “The Sequence of the Human Genome,” Science, 291, pp. 1304-1351, 2001.
40.Werner, T., “Models for Prediction and Recognition of Eukaryotic Promoters,” Mammalian Genome, 10, pp. 168–175, 1999.
41.Wingender, E., Dietze, P., Karas, H. and Knuppel, R., “TRANSFAC: A Database on Transcription Factors and Their DNA Binding sites,” Nucleic Acids Research, 24, pp. 238-241, 1996.
42.Wingender, E., et al., “The TRANSFAC System on Gene Expression Regulation,” Nucleic Acids Research, 29, 1, pp. 281-283, 2001.
43.WHO (World Health Organization), “Global Prevalence and Incidence of Selected Curable Sexually Transmitted Infections: Overview and Estimates,” World Health Organization, Geneva, 2001.
44.Wolner-Hanssen, P., et al., “Clinical Manifestations of Vaginal Trichomoniasis,” JAMA, 261, pp. 571-576, January 1989.
45.Workman, C. T. and Stormo, G. D., “ANN-Spec: A Method for Discovering Transcription Factor Binding Sites with Improved Specificity,” Pacific Symposium on Biocomputing, 5, pp. 464-475, 2000.
中文部分
46.黃茁淳,「應用資料探勘技術於預測生物體中之基因轉錄調控因子」,中央大學,碩士論文,民國九十年。
網頁部分
47.BLAST, http://www.ncbi.nih.gov/BLAST/,2006.
48.Human Genome Project Information, http://doegenomes.org/, 2006.
49.InterPro, http://www.ebi.ac.uk/interpro/,2006.
50.MIPS, http://mips.gsf.de/proj/funcatDB/search_main_frame.html, 2006.
51.Saccharomyces Genome Database (SGD), http://www.yeastgenome.org/, 2006.
52.TIGR, http://www.tigr.org/, 2006.
53.TRANSFAC, http://www.gene-regulation.com/pub/databases.html#transfac, 2006.
54.生物科技面面觀,http://biotech.nstm.gov.tw/home.asp,2006.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top