2020
DOI: 10.1101/2020.12.01.405886
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data

Abstract: Standard preprocessing of single-cell RNA-seq UMI data includes normalization by sequencing depth to remove this technical variability, and nonlinear transformation to stabilize the variance across genes with different expression levels. Instead, two recent papers propose to use statistical count models for these tasks: Hafemeister and Satija (2019) recommend using Pearson residuals from negative binomial regression, while Townes et al. (2019) recommend fitting a generalized PCA model. Here, we investigate the… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
24
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 15 publications
(27 citation statements)
references
References 71 publications
(207 reference statements)
2
24
0
Order By: Relevance
“…Lause et al (2021) argued that neither the estimation of β is nor the estimation of one overdispersion per gene are necessary. Instead, Lause et al (2021) suggested treating the log-size factors as offsets (i.e., fixing β is = 1) and fixing the overdispersion to α = 0.01, because that is roughly the overdispersion they observed in experiments where an RNA solution is homogeneously encapsulated in droplets. Hafemeister and Satija (2020) responded that estimating a gene-wise coefficient for the size factor "allows sctransform to adapt to artifacts and biases" and that fixing the overdispersion to a small value over-emphasizes the variation of highly abundant housekeeping genes.…”
Section: Pearson Residualsmentioning
confidence: 99%
“…Lause et al (2021) argued that neither the estimation of β is nor the estimation of one overdispersion per gene are necessary. Instead, Lause et al (2021) suggested treating the log-size factors as offsets (i.e., fixing β is = 1) and fixing the overdispersion to α = 0.01, because that is roughly the overdispersion they observed in experiments where an RNA solution is homogeneously encapsulated in droplets. Hafemeister and Satija (2020) responded that estimating a gene-wise coefficient for the size factor "allows sctransform to adapt to artifacts and biases" and that fixing the overdispersion to a small value over-emphasizes the variation of highly abundant housekeeping genes.…”
Section: Pearson Residualsmentioning
confidence: 99%
“…We next focused on the application of negative binomial error models, and considered different strategies for parameterizing the level of overdispersion associated with each gene. Recent work (22) suggested that a negative binomial model with a fixed parameterization (for example, inverse overdispersion parameter θ = 100) could be applied to all scRNAseq datasets to achieve effective variance stabilization. To explore whether a single value of θ could be applied to diverse scRNA-seq datasets, we first independently fit θ estimates for each gene in each dataset using a GLM with negative binomial errors (NB GLM), using library size as an offset to account for variation in cellular sequencing depth.…”
Section: The Level Of Overdispersion Varies Substantially Across Datasetsmentioning
confidence: 99%
“…We also considered the findings from (22), which proposed that θ values should not vary as a function of gene abundance, and suggested that the relationship between these two variables was driven entirely by biases in the parameter estimation procedure, especially when analyzing lowly expressed genes. We first confirmed that lowly expressed genes, particularly those with average abundance < 0.1 UMI/cell, posed difficulties for parameter estimation.…”
Section: Gene Overdispersion Varies As a Function Of Abundancementioning
confidence: 99%
See 2 more Smart Citations