2015
DOI: 10.1093/bioinformatics/btv425
|View full text |Cite
|
Sign up to set email alerts
|

Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment

Abstract: Motivation: High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations.Results: A 48-r… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

2
94
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 81 publications
(96 citation statements)
references
References 25 publications
2
94
0
Order By: Relevance
“…Jaccard similarity comparisons (Levandowsky and Winter 1971) highlighted the effect of the variation in read numbers and complexity across samples. We increased replicate and sample comparability by applying a subsampling normalization (Efron and Tibshirani 1993;Gierlinśki et al 2015;Schurch et al 2016) without replacement (I Mohorianu, A Bretman, DT Smith, EK Fowler, T Dalmay, T Chapman, in prep.) at a fixed total (50 M, Supplemental Table S2A).…”
Section: Bioinformaticsmentioning
confidence: 99%
“…Jaccard similarity comparisons (Levandowsky and Winter 1971) highlighted the effect of the variation in read numbers and complexity across samples. We increased replicate and sample comparability by applying a subsampling normalization (Efron and Tibshirani 1993;Gierlinśki et al 2015;Schurch et al 2016) without replacement (I Mohorianu, A Bretman, DT Smith, EK Fowler, T Dalmay, T Chapman, in prep.) at a fixed total (50 M, Supplemental Table S2A).…”
Section: Bioinformaticsmentioning
confidence: 99%
“…Several recent studies have critically evaluated alternative methodologies for differential transcript and gene expression to determine the relative merits of these approaches [66][67][68][69]. Soneson et al [69] demonstrated that differential gene expression (DGE) analyses produce more accurate results than differential transcript expression (DTE) analyses.…”
Section: Differential Gene and Transcript Expression Analysesmentioning
confidence: 99%
“…Two analyses were conducted in R version 3.3.1 [73] using edgeR version 3.16.1, a Bioconductor package (release 3.4) that evaluates statistical differences in count data between treatment groups [70,71]. Our first method utilized tximport, an R package developed by Soneson et al [67], which incorporates transcriptome mapping-rate estimates with a gene count matrix to enable downstream DGE analysis. The authors assert that such transcriptome mapping can generate more accurate estimates of DGE than traditional pipelines [69].…”
Section: Differential Gene and Transcript Expression Analysesmentioning
confidence: 99%
“…In 95 real RNA-Seq data, the observed variance of the read counts is significantly greater than the sample variance modeled by the distribution (Oberg et al, 2012) due to outliers. Two studies based on real RNA-Seq data show that approximately 5% genes from the same biological condition have significantly higher variance in transcript 100 expression than expected due to outliers (Gierlinski et al, 2015;Oberg et al, 2012). To simulate datasets that reflect real RNA-Seq data as much as possible, 5% genes are selected as genes that contain outliers in the simulated samples.…”
mentioning
confidence: 99%
“…To simulate datasets that reflect real RNA-Seq data as much as possible, 5% genes are selected as genes that contain outliers in the simulated samples. Extreme high values of transcript expression are usually detected in approximately 10% of real 105 RNA-Seq samples (Gierlinski et al, 2015). To simulate such extreme high expression values, we allow the randomly drawn read counts r t;i for transcript t from the selected outlier genes to have a 10% probability of being amplified from 5 to 10 times in sample i as done in (Zhou et al, 2014).…”
mentioning
confidence: 99%