跳到主要內容

臺灣博碩士論文加值系統

(3.231.230.177) 您好!臺灣時間:2021/07/27 10:28
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:曾禹翔
研究生(外文):Yu-Shiang Zeng
論文名稱:以貝氏分析方法來偵測轉錄體定序資料之顯著基因
論文名稱(外文):Identification of Differentially Expressed Genes ofRNA-Seq Data based on Bayesian Approaches
指導教授:蔡政安蔡政安引用關係
口試委員:劉仁沛劉力瑜謝叔蓉
口試日期:2015-06-25
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:農藝學研究所
學門:農業科學學門
學類:一般農業學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:中文
論文頁數:84
中文關鍵詞:轉錄體定序資料基因表現量貝氏分析對數線性模型
外文關鍵詞:RNA-seqGene expressionBayesian inferenceLog-linear model
相關次數:
  • 被引用被引用:0
  • 點閱點閱:175
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近來隨著次世代定序技術發展愈來愈快速以及日趨成熟,這項科技
已經在各個領域廣泛的被使用到,如醫學、農業、生物科技等等。次世
代定序技術可以用來做全基因體定序,也可以將一些已知的物種重新
定序,更可以探討在生物性上的理論,而其中一項重要的應用就是轉
錄體定序(RNA-seq) 資料。轉錄體定序資料常被用來檢定基因表現量,
近年來,轉錄體定序資料已漸漸取代微陣列資料(Microarray) 成為研究基因表現量的一個指標。然而在探討轉錄體定序資料時,由於它是屬
於離散型變數,且資料會發生變異數大於平均值的現象,這種現象我
們稱作過度離異(over-dispersion)。我們通常會用負二項分配(Negative
Binomial Model) 解決過度離異問題,但如何估計模型中的參數,這其
中又牽涉到許多統計方法。近來常見的如DESeq、edgeR 跟DSS 都是
在分析上常用的方法。但這幾種方法都是用點估計來估計參數,並沒
有將不確定性考慮進去。在本論文中,我們建立了兩個模型,分別為
對數線性模型,以及貝氏階層模型,利用馬可夫鏈蒙地卡羅(MCMC)
的方法得到我們有興趣的參數,進而可以找出表現量不同的基因。最
後我們分別利用模擬資料以及實際資料來評估DESeq、edgeR、DSS 以
及我們方法的好壞。其中我們發現當各組的重複數接近甚至相同的時
候,我們的線性對數模型相較於其他方法是表現較好的;而當重複數
如果是極端不平衡的情況之下,我們會建議利用中位數估計法來進行
檢定。

With the rapid development of Next Generation Sequencing technology, plenty of industries such as medical science, agriculture and bio-technology are taken to the next level. Next Generation Sequencing technology makes
whole genome sequencing and de novo sequencing possible to explore the biology-based theory; besides, RNA-seq data is one of the core applications of Next Generation Sequencing technology. RNA-seq data is to obtain the gene expression level and to test whether specific
gene is differentially expressed. Recently, RNA-seq data has replaced Microarray technology and becomes the important benchmark of gene expression test gradually. However, because of the discrete RNA-Seq read counts,
the phenomena of over-dispersion (the variance of the data is larger than the mean) will occur.
To deal with over-dispersion problem, negative binomial model is applied; however, the parameter estimation is another issue to be considered. Nowadays, some analysis softwares for RNA-seq data like DESeq, edgeR and DSS
only use point estimation to obtain the parameters without considering the uncertainty in RNA-seq data.
Here, we use Markov chain Monte Carlo (MCMC) method to obtain the estimates of parameters that it may be concerned with detecting the differentially expressed genes. In the end of the thesis, we compare the performance of DESeq, edgeR, DSS and our method by both simulated and real RNA-seq data. Our log-linear model performs much more superior than DESeq, edgeR
and DSS while the replicates between groups are close or same. Besides, when the number of replicates between groups is extremely unbalanced, then we suggest that median estimator would be the proper method for detecting
differentially expressed genes.

摘要ii
Abstract iv
1 Introduction 1
1.1 Brief Overview of RNA-seq Studies . . . . . . . . . . . . . . . . . . . . 1
1.2 Challenges in Analysis Methods for RNA-seq Data . . . . . . . . . . . . 5
1.3 Contributions of Our Proposed Method . . . . . . . . . . . . . . . . . . 7
2 Review of Current Methods 9
2.1 The Dispersion Shrinkage for Sequencing (DSS) . . . . . . . . . . . . . 9
2.2 Moderated Statistical Tests for Assessing Differences in Tag Abundance
(edgeR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Differential Expression Analysis for Sequence Count Data (DESeq) . . . 14
3 The Proposed Methods 18
3.1 Gamma-Poisson Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Log-Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Numerical Studies 23
4.1 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Control of Type I Error Rate . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Estimation of ϕ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 Accuracy of DE Detection . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 FDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.6 Test Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.7 Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 Discussion and Conclusions 34
Bibliography 81

[1] Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating
inhibitors. Proc. Natl. Acad. Sci. U. S. A. 74, 5463–5467 (1977).
[2] Heiger, D. N., Cohen, A. S. & Karger, B. L. Separation of DNA restriction fragments
by high performance capillary electrophoresis with low and zero crosslinked polyacrylamide
using continuous and pulsed electric fields. J. Chromatogr. 516, 33–48
(1990).
[3] Mani, U., Mukund, S. & Ravisankar, S. Sanger method of DNA sequencing (2014).
URL http://www.bioindians.org/index.html.
[4] Drmanac, R., Labat, I., Brukner, I. & Crkvenjakov, R. Sequencing of megabase plus
DNA by hybridization: theory of the method. Genomics 4, 114–128 (1989).
[5] Maxam, A. M. & Gilbert, W. A new method for sequencing DNA. Proc. Natl. Acad.
Sci. U. S. A. 74, 560–564 (1977).
[6] Shendure, J. & Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 26,
1135–1145 (2008).
[7] Barba, M., Czosnek, H. & Hadidi, A. Historical perspective, development and applications
of next-generation sequencing in plant virology. Viruses 6, 106–36 (2014).
[8] Branton, D. et al. The potential and challenges of nanopore sequencing. Nat. Biotechnol.
26, 1146–1153 (2008).
[9] Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers.
Nature 457, 854–858 (2009).
[10] Wang, L., Li, P. & Brutnell, T. P. Exploring plant transcriptomes using ultra highthroughput
sequencing. Briefings Funct. Genomics Proteomics 9, 118–128 (2010).
[11] Oshlack, A., Robinson, M. D. & Young, M. D. From RNA-seq reads to differential
expression results. Genome Biol. 11, 220 (2010).
[12] Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. & Gilad, Y. comparison
with gene expression arrays RNA-seq : An assessment of technical reproducibility
and comparison with gene expression arrays. Genome Res. 1509–1517 (2008).
[13] Robinson, M. D. & Smyth, G. K. Moderated statistical tests for assessing differences
in tag abundance. Bioinformatics 23, 2881–2887 (2007).
[14] Anders, S. & Huber, W. Differential expression analysis for sequence count data.
Genome Biol. 11, R106 (2010).
[15] Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of
RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7,
1009–1015 (2010).
[16] Glaus, P., Honkela, A. & Rattray, M. Identifying differentially expressed transcripts
from RNA-seq data with biological variation. Bioinformatics 28, 1721–1728 (2012).
[17] Hastings, W. K. Monte Carlo sampling methods using Markov chains and their
applications. Biometrika 57, 97–109 (1970).
[18] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E.
Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 21,
1087–1092 (1953).
[19] Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution
with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013).
[20] Feng, J. et al. GFOLD: A generalized fold change for ranking differentially expressed
genes from RNA-seq data. Bioinformatics 28, 2782–2788 (2012).
[21] Seesi, S. A., Tiagueu, Y. T. & Zelikovsky, A. Bootstrap-based differential gene
expression analysis for RNA-Seq data with and without replicates. BMC Genomic
15, 1–6 (2014).
[22] Wu, H., Wang, C. & Wu, Z. A new shrinkage estimator for dispersion improves
differential expression detection in RNA-seq data. Biostatistics 14, 232–243 (2013).
[23] Lu, J., Tomfohr, J. K. & Kepler, T. B. Identifying differential expression in multiple
SAGE libraries: an overdispersed log-linear model approach. BMC Bioinformatics
6, 165 (2005).
[24] Robinson, M. D. & Smyth, G. K. Small-sample estimation of negative binomial
dispersion, with applications to SAGE data. Biostatistics 9, 321–332 (2008).
[25] Landau, W. M. & Liu, P. Dispersion estimation and its effect on test performance in
RNA-seq data analysis: A simulation-based comparison of methods. PLoS One 8
(2013).
[26] Pickrell, J. K. et al. Understanding mechanisms underlying human gene expression
variation with RNA sequencing. Nature 464, 768–772 (2010).
[27] Hammer, P. et al. mRNA-seq with agnostic splice site discovery for nervous system
transcriptomics tested in chronic pain. Genome Res. 20, 847–860 (2010).
[28] ’t Hoen, P. et al. Deep sequencing-based expression analysis shows major advances
in robustness, resolution and inter-lab portability over five microarray platforms.
Nucleic Acids Res. 36 (2008).
[29] Lai, Y. Differential expression analysis of digital gene expression data: RNA-tag
filtering, comparison of t-type tests and their genomewide co-expression based adjustments.
Changes 29, 997–1003 (2012).

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top