(3.238.96.184) 您好!臺灣時間:2021/05/12 23:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:施燦煌
研究生(外文):Tsan-Huang Shih
論文名稱:轉錄體功能註解及跨物種比對分析
論文名稱(外文):Functional Annotation of Transcriptomic Data and Cross-species Comparison
指導教授:白敦文
指導教授(外文):Tun-Wen Pai
學位類別:碩士
校院名稱:國立臺灣海洋大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:中文
論文頁數:60
中文關鍵詞:基因本體論生物路徑蛋白質域跨物種比較系統生物學
外文關鍵詞:RNA-seqGene OntologyBiological PathwayProtein DomainCross-species ComparisonSystems BiologyBiMFG
相關次數:
  • 被引用被引用:0
  • 點閱點閱:276
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:31
  • 收藏至我的研究室書目清單書目收藏:0
轉錄體定序或稱為RNA-seq定序技術是指經由次世代定序儀器所產生的豐富轉錄體資料,這些資料不僅可以提供各種生物的基因功能偵測,也可以透過比較不同時間點所獲得的轉錄體序列,觀察具有時序性基因表現量的變化或是提供不同品系間基因表現量的差異性。然而,許多RNA-seq的研究僅對個別基因資料進行統計分析,往往忽略基因群組的轉錄體功能表現或是生物路徑上基因群相互調控的關係。因此,本研究開發一套轉錄體分析系統,透過基因本體論、生物訊息路徑及蛋白質域等基因功能的註解及分類方法,進行系統化之分析及呈現。本系統透過輸入轉錄體序列後,經序列比對辨識及基因分群、進行跨物種間功能差異性標註及生物路徑變化的分析。然而針對不同RNA-seq實驗資料集,本系統可以進行轉錄體基因表現量差異性的量化分析。為避免不同實驗所造成基因表現量的偏差,本研究首先分析具有固定表現量特徵的管家基因群(Housekeeping genes),統一各次實驗的管家基因表現量,再對各實驗的所有轉錄體資料進行正規化,最後以統計圖表方式呈現系統生物學上的差異性。最後,本論文亦提供完整的生物資訊分析系統(BiMFG),進一步提供使用者分析生物序列及結構比對特徵。目前此系統資料庫主要是收錄海洋及水生生物相關的模式物種序列及結構,並提供生物資料檢索及多重排比工具進行一級序列、二級結構及三維結構的分析比對及功能註解。
The technologies of high-throughput sequencing exploited dynamic complementary DNA sequencing in an approach termed RNA-seq. The RNA-seq data was used not only to analyze the unknown genes or functions from incomplete sequencing genome but also to observe the mutation of gene differential expression from multiple transcriptomes sequenced at different time points or among different strains. However, most RNA-seq analyses focused on evaluating the amount of recognized genes or embracing a small set of genes related to a selected function at a time. Therefore, some important associated information might be ignored due to limited analytical scale of gene analysis. Hence, we developed an integrated system to analyze transcriptomic data by featuring some functional classification methods including gene ontology (GO), biological pathway and protein domain/family. The developed system could annotate and cluster the assembled contigs from various species and visualize the functional relationship through systems biology representation for cross-species comparison. In addition, analysis of differential gene expression among various RNA-seq experiments based on read coverage account is also proposed in this system. First, to avoid the bias from gene expression level among various experiments, the conserved homologous housekeeping genes and their corresponding coverage counts were selected as standards for performing normalization procedures. Then, the variations of gene expression were annotated, compared, and visualized by several statistical graphs in terms of systems biology representation. Finally, the assembled contigs could be analyzed in details by using an integrated and comprehensive bioinformatics system (BiMFG), which system includes information retrieval function for comparing with marine and freshwater related model species and it also provides analytical tools at different levels of biological sequence including: primary sequences, secondary structures and tertiary structures.
摘要 I
ABSTRACT III
誌謝 V
TABLE OF CONTENTS VI
LIST OF TABLES VIII
LIST OF FIGURES IX
LIST OF ABBREVIATIONS XI
1. INTRODUCTION 1
2. FUNCTIONAL ANNOTATIONS FOR GENES 4
2.1 GENE ONTOLOGY 4
2.2 BIOLOGICAL PATHWAY 5
2.3 PROTEIN DOMAIN/FAMILY 5
3. SYSTEM OVERVIEW 7
4. FUNCTIONAL ANNOTATION OF TRANSCRIPTOMIC DATA 9
4.1 TRANSCRIPTOME ASSEMBLY 9
4.2 VERIFICATION AND RETRIEVAL 11
4.3 FUNCTIONAL ANNOTATION 13
5. RNA-SEQ COVERAGE ACCOUNTS ANALYSIS 15
5.1 MULTIPLE EXPERIMENTAL RNA-SEQ ANALYSIS 16
5.2 REFERENCE MAPPING AND NORMALIZATION 17
5.3 BIOLOGICAL PATHWAY MAPPING 18
5.4 GENE ONTOLOGY CLUSTER 19
6. BIMFG SYSTEM 21
6.1 TOOLS FOR REMOTE SEQUENCE AND STRUCTURE RETRIEVAL 22
6.2 MULTIPLE GENOMIC DATA COMPARISON 23
6.3 LENGTH ENCODED SECONDARY STRUCTURE TRANSFORMATION 24
6.4 MULTIPLE INDEXING SEQUENCE ALIGNMENT 27
6.5 MULTIPLE STRUCTURAL ALIGNMENT 28
6.6 DATABASE UPDATE 30
7. EXPERIMENTAL RESULTS 32
7.1 FUNCTIONAL ANNOTATION OF TRANSCRIPTOMIC DATA 32
7.1.1 Plant UCO verification and A. thaliana gene retrieval 32
7.1.2 Example of cross-species comparison 33
7.1.3 Example of transcript isoform detection 34
7.1.4 Example of functional annotation 35
7.1.5 Database in Functional Annotation of Transcriptomic Data system 40
7.2 RNA-SEQ COVERAGE ACCOUNTS ANALYSIS 42
7.2.1 Differential expression in hybrid and co-cultural datasets 44
7.2.2 Differential expression in different time points 47
7.3 BIMFG SYSTEM 49
7.3.1 Sequence retrieval by LESS 49
7.3.2 Example of MSA tool 49
7.3.3 Example of MStA tool 50
7.3.4 BiMFG update 51
8. CONCLUSIONS 54
REFERENCES 57
[1] S. Marguerat and J. Bahler, "RNA-seq: from technology to biology," Cell Mol Life Sci, vol. 67, pp. 569-79, Feb 2010.
[2] Z. Wang, M. Gerstein, and M. Snyder, "RNA-Seq: a revolutionary tool for transcriptomics," Nat Rev Genet, vol. 10, pp. 57-63, Jan 2009.
[3] D. J. Sugarbaker, W. G. Richards, G. J. Gordon, et al., "Transcriptome sequencing of malignant pleural mesothelioma tumors," Proc Natl Acad Sci U S A, vol. 105, pp. 3521-6, Mar 4 2008.
[4] C. A. Maher, C. Kumar-Sinha, X. Cao, et al., "Transcriptome sequencing to detect gene fusions in cancer," Nature, vol. 458, pp. 97-101, Mar 5 2009.
[5] Q. Zhao, O. L. Caballero, S. Levy, et al., "Transcriptome-guided characterization of genomic rearrangements in a breast cancer cell line," Proc Natl Acad Sci U S A, vol. 106, pp. 1886-91, Feb 10 2009.
[6] B. T. Wilhelm and J. R. Landry, "RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing," Methods, vol. 48, pp. 249-57, Jul 2009.
[7] M. Garber, M. G. Grabherr, M. Guttman, et al., "Computational methods for transcriptome annotation and quantification using RNA-seq," Nat Methods, vol. 8, pp. 469-77, Jun 2011.
[8] J. Shendure and H. Ji, "Next-generation DNA sequencing," Nat Biotechnol, vol. 26, pp. 1135-45, Oct 2008.
[9] M. Ashburner, C. A. Ball, J. A. Blake, et al., "Gene ontology: tool for the unification of biology. The Gene Ontology Consortium," Nat Genet, vol. 25, pp. 25-9, May 2000.
[10] T. Beissbarth and T. P. Speed, "GOstat: find statistically overrepresented Gene Ontologies within a group of genes," Bioinformatics, vol. 20, pp. 1464-5, Jun 12 2004.
[11] S. Bauer, S. Grossmann, M. Vingron, et al., "Ontologizer 2.0--a multifunctional tool for GO term enrichment analysis and data exploration," Bioinformatics, vol. 24, pp. 1650-1, Jul 15 2008.
[12] B. R. Zeeberg, W. Feng, G. Wang, et al., "GoMiner: a resource for biological interpretation of genomic and proteomic data," Genome Biol, vol. 4, p. R28, 2003.
[13] W. Huang da, B. T. Sherman, and R. A. Lempicki, "Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources," Nat Protoc, vol. 4, pp. 44-57, 2009.
[14] L. T. Macneil and A. J. Walhout, "Gene regulatory networks and the role of robustness and stochasticity in the control of gene expression," Genome Res, vol. 21, pp. 645-57, May 2011.
[15] A. Andreeva, D. Howorth, J. M. Chandonia, et al., "Data growth and its impact on the SCOP database: new developments," Nucleic Acids Res, vol. 36, pp. D419-25, Jan 2008.
[16] A. L. Cuff, I. Sillitoe, T. Lewis, et al., "Extending CATH: increasing coverage of the protein structure universe and linking structure with function," Nucleic Acids Res, vol. 39, pp. D420-6, Jan 2011.
[17] M. Punta, P. C. Coggill, R. Y. Eberhardt, et al., "The Pfam protein families database," Nucleic Acids Res, vol. 40, pp. D290-301, Jan 2012.
[18] B. Langmead and S. L. Salzberg, "Fast gapped-read alignment with Bowtie 2," Nat Methods, vol. 9, pp. 357-9, Apr 2012.
[19] M. G. Grabherr, B. J. Haas, M. Yassour, et al., "Full-length transcriptome assembly from RNA-Seq data without a reference genome," Nat Biotechnol, vol. 29, pp. 644-52, Jul 2011.
[20] C. bio, CLC Genomics Workbench Product Sheet.
[21] K. Krutovsky, C. Elsik, M. Matvienko, et al., "Conserved ortholog sets in forest trees," Tree Genetics & Genomes, vol. 3, pp. 61-70, 2006.
[22] C. Camacho, G. Coulouris, V. Avagyan, et al., "BLAST+: architecture and applications," BMC Bioinformatics, vol. 10, p. 421, 2009.
[23] M. A. Larkin, G. Blackshields, N. P. Brown, et al., "Clustal W and Clustal X version 2.0," Bioinformatics, vol. 23, pp. 2947-8, Nov 1 2007.
[24] M. Kotera, M. Hirakawa, T. Tokimatsu, et al., "The KEGG databases and tools facilitating omics analysis: latest developments involving human diseases and pharmaceuticals," Methods Mol Biol, vol. 802, pp. 19-39, 2012.
[25] P. Flicek, M. R. Amode, D. Barrell, et al., "Ensembl 2012," Nucleic Acids Res, vol. 40, pp. D84-90, Jan 2012.
[26] S. Anders and W. Huber, "Differential expression analysis for sequence count data," Genome Biol, vol. 11, p. R106, 2010.
[27] J. H. Bullard, E. Purdom, K. D. Hansen, et al., "Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments," BMC Bioinformatics, vol. 11, p. 94, 2010.
[28] M. F. Rogers and A. Ben-Hur, "The use of gene ontology evidence codes in preventing classifier assessment bias," Bioinformatics, vol. 25, pp. 1173-7, May 1 2009.
[29] T. H. Shih, C. M. Chen, H. W. Wang, et al., "BiMFG: bioinformatics tools for marine and freshwater species," J Bioinform Comput Biol, vol. 8 Suppl 1, pp. 17-32, Dec 2010.
[30] V. A. Simossis and J. Heringa, "Integrating protein secondary structure prediction and multiple sequence alignment," Curr Protein Pept Sci, vol. 5, pp. 249-66, Aug 2004.
[31] J. Cheng, A. Z. Randall, M. J. Sweredoski, et al., "SCRATCH: a protein structure and structural feature prediction server," Nucleic Acids Res, vol. 33, pp. W72-6, Jul 1 2005.
[32] D. T. Jones, "Protein secondary structure prediction based on position-specific scoring matrices," J Mol Biol, vol. 292, pp. 195-202, Sep 17 1999.
[33] R. Adamczak, A. Porollo, and J. Meller, "Combining prediction of secondary structure and solvent accessibility in proteins," Proteins, vol. 59, pp. 467-75, May 15 2005.
[34] W. Y. Chou, T. W. Pai, J. C. Lai, et al., "Multiple indexing sequence alignment for group feature identification," presented at the The 3rd Annual Recomb Satellite Workshop on Regulatory Genomics, 2006.
[35] Y.-C. Hsu, C.-M. Chen, T.-W. Pai, et al., "Length Encoded Secondary Structure Profile for Remote Homologous Protein Detection Algorithms and Architectures for Parallel Processing." vol. 5574, A. Hua and S.-L. Chang, Eds., ed: Springer Berlin / Heidelberg, 2009, pp. 1-11.
[36] R. M. Karp and M. O. Rabin, "Efficient randomized pattern-matching algorithms," IBM J. Res. Dev., vol. 31, pp. 249-260, 1987.
[37] W. Kabsch and C. Sander, "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features," Biopolymers, vol. 22, pp. 2577-637, Dec 1983.
[38] Y. Zhang and J. Skolnick, "TM-align: a protein structure alignment algorithm based on the TM-score," Nucleic Acids Res, vol. 33, pp. 2302-9, 2005.
[39] W. Kabsch, "A solution for the best rotation to relate two sets of vectors," Acta Crystallographica Section A, vol. 32, pp. 922-923, 1976.
[40] P. W. Rose, B. Beran, C. Bi, et al., "The RCSB Protein Data Bank: redesigned web site and web services," Nucleic Acids Res, vol. 39, pp. D392-401, Jan 2011.
[41] Y. Kodama, M. Shumway, and R. Leinonen, "The Sequence Read Archive: explosive growth of sequencing data," Nucleic Acids Res, vol. 40, pp. D54-6, Jan 2012.
[42] J. J. Emerson, L. C. Hsieh, H. M. Sung, et al., "Natural selection on cis and trans regulation in yeasts," Genome Res, vol. 20, pp. 826-36, Jun 2010.
[43] J. Pan, M. Sasaki, R. Kniewel, et al., "A hierarchical combination of factors shapes the genome-wide topography of yeast meiotic recombination initiation," Cell, vol. 144, pp. 719-31, Mar 4 2011.
[44] R. Li, C. Yu, Y. Li, et al., "SOAP2: an improved ultrafast tool for short read alignment," Bioinformatics, vol. 25, pp. 1966-7, Aug 1 2009.
[45] K. L. Huisinga and B. F. Pugh, "A genome-wide housekeeping role for TFIID and a highly regulated stress-related role for SAGA in Saccharomyces cerevisiae," Mol Cell, vol. 13, pp. 573-85, Feb 27 2004.
[46] M. Menke, B. Berger, and L. Cowen, "Matt: local flexibility aids protein multiple structure alignment," PLoS Comput Biol, vol. 4, p. e10, Jan 2008.
[47] I. Ilinkin, J. Ye, and R. Janardan, "Multiple structure alignment and consensus identification for proteins," BMC Bioinformatics, vol. 11, p. 71, 2010.
[48] Y. Ye and A. Godzik, "Multiple flexible structure alignment using partial order graphs," Bioinformatics, vol. 21, pp. 2362-9, May 15 2005.
[49] I. Van Walle, I. Lasters, and L. Wyns, "SABmark--a benchmark for sequence alignment that covers the entire known fold space," Bioinformatics, vol. 21, pp. 1267-8, Apr 1 2005.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔