2021
DOI: 10.1186/s13059-021-02451-7
|View full text |Cite
|
Sign up to set email alerts
|

Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data

Abstract: Background Standard preprocessing of single-cell RNA-seq UMI data includes normalization by sequencing depth to remove this technical variability, and nonlinear transformation to stabilize the variance across genes with different expression levels. Instead, two recent papers propose to use statistical count models for these tasks: Hafemeister and Satija (Genome Biol 20:296, 2019) recommend using Pearson residuals from negative binomial regression, while Townes et al. (Genome Biol 20:295, 2019) … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
141
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 90 publications
(144 citation statements)
references
References 64 publications
3
141
0
Order By: Relevance
“…However, we show that this excess of variance is simply due to the variability in the total number of reads per cell, and confirm that the Poisson law is well suited to model droplet-based scRNA-seq. Our conclusions are in line with recently published reanalyses of these control datasets [21,22]. We note that our demonstration is limited to droplet-based scRNA-seq as we only found control datasets available for these techniques.…”
Section: Poisson Distribution Is Sufficient To Model Droplet-based Scrna-seqsupporting
confidence: 92%
See 1 more Smart Citation
“…However, we show that this excess of variance is simply due to the variability in the total number of reads per cell, and confirm that the Poisson law is well suited to model droplet-based scRNA-seq. Our conclusions are in line with recently published reanalyses of these control datasets [21,22]. We note that our demonstration is limited to droplet-based scRNA-seq as we only found control datasets available for these techniques.…”
Section: Poisson Distribution Is Sufficient To Model Droplet-based Scrna-seqsupporting
confidence: 92%
“…The authors find values of φ close to 0.01. Another reanalysis of these datasets comparing their behaviours with simulated datasets following negative binomial of known dispersion conclude that they are not similar to simulated datasets with Poisson model but were consistent with φ values around 0.01 which makes the Poisson model sufficient in practice [22].…”
Section: Poisson Distribution Is a Good Approximation For Droplet-based Single-cell Rna-seq Datamentioning
confidence: 88%
“…We next focused on the application of negative binomial error models, and considered different strategies for parameterizing the level of overdispersion associated with each gene. Recent work [ 22 ] suggested that a negative binomial model with a fixed parameterization (for example, inverse overdispersion parameter θ =100) could be applied to all scRNA-seq datasets to achieve effective variance stabilization. To explore whether a single value of θ could be applied to diverse scRNA-seq datasets, we first independently fit θ estimates for each gene in each dataset using a GLM with negative binomial errors (NB GLM), using library size as an offset to account for variation in cellular sequencing depth.…”
Section: Resultsmentioning
confidence: 99%
“…Even for methods that assume a NB distribution, different groups propose different methods to parameterize their model. For example, a recent study [ 22 ] argued that fixing the NB inverse overdispersion parameter θ to a single value is an appropriate estimate of technical overdispersion for all genes in all scRNA-seq datasets, while others [ 23 ] propose learning unique parameter values for each gene in each dataset. This lack of consensus is further exemplified by the scvi-tools [ 11 , 24 ] suite, which supports nine different methods for parameterizing error models.…”
Section: Introductionmentioning
confidence: 99%
“…In order to apply TuBA to single-cell RNA sequencing data, first, genes that had UMI count of zero in more than 97.5% of the cells were removed (Minimum population with non-zero counts were determined based on TuBA's percentile set size parameter of 2.5% in this analysis). Following recommendation by Lause et al (2021), the UMI counts were transformed to Pearson residuals defined by,…”
Section: Single-cell Biclustering Analysismentioning
confidence: 99%