Single-cell RNA-seq denoising using a deep count autoencoder

Eraslan, Gökçen; Simon, Lukas M.; Mircea, Maria; Mueller, Nikola S.; Theis, Fabian J.

doi:10.1038/s41467-018-07931-2

Cited by 739 publications

(500 citation statements)

References 50 publications

(67 reference statements)

Supporting

Mentioning

435

Contrasting

Order By: Relevance

“…We demonstrate that a regularization step, a commmon step in bulk RNA-seq analysis [Robinson et al, 2010, McCarthy et al, 2012 where parameter estimates are pooled across genes with similar mean abundance, can effectively overcome this challenge and yield reproducible models. Importantly, many statistical and deep-learning methods designed for single cell RNA-seq data utilize a negative binomial (or zero-inflated negative binomial) error model [Lopez et al, 2018, Eraslan et al, 2019. Our results suggest that each of these methods could benefit by substituting a regularized model, and that including an additional parameter for zero-inflation could exacerbate the risk of overfitting.…”

Section: Discussionmentioning

confidence: 91%

“…For example, ZINB-WaVE by Risso et al [2018] models counts as ZINB in a special variant of factor analysis. scVI and DCA also use the ZINB noise model [Lopez et al, 2018, Eraslan et al, 2019, either for normalization and dimensionality reduction in Bayesian hierarchical models, or for a denoising autoencoder. These pioneering approaches extend beyond pre-processing and normalization, but rely on the accurate estimation of per-gene error models.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression

Hafemeister

Satija

2019

Preprint

837

1,054

View full text Add to dashboard Cite

Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from 'regularized negative binomial regression', where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation, and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat.

show abstract

Section: Discussionmentioning

confidence: 91%

Section: Introductionmentioning

confidence: 99%

Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression

Hafemeister

Satija

2019

Preprint

837

1,054

View full text Add to dashboard Cite

show abstract

“…Systems biologists have derived and adapted numerical and computational methods for dimensionality reduction to allow for low-dimensional representation of single-cell data and deduction of cell states and fates (Van der Maaten and Hinton, 2008;Pierson and Yau, 2015;Linderman et al, 2017;Wang et al, 2017;Becht et al, 2018;Ding, Condon and Shah, 2018;Lopez et al, 2018;Risso et al, 2018;Eraslan et al, 2019;Townes et al, 2019).…”

Section: Discussionmentioning

confidence: 99%

“…However, given the distribution and sparsity of scRNA-seq data, complex, nonlinear transformations are often required to capture and visualize expression patterns. Unsupervised machine learning techniques and, more recently, deep learning methods, are being rapidly developed to assist researchers in single-cell transcriptomic analysis (Van der Maaten and Hinton, 2008;Pierson and Yau, 2015;Linderman et al, 2017;Wang et al, 2017;Becht et al, 2018;Ding, Condon and Shah, 2018;Lopez et al, 2018;Risso et al, 2018;Eraslan et al, 2019;Townes et al, 2019). Because these techniques condense cell features in the native space to a small number of latent dimensions for visualization, lost information can result in exaggerated or dampened cell-cell similarity.…”

Section: Introductionmentioning

confidence: 99%

A quantitative framework for evaluating single-cell data structure preservation by dimensionality reduction techniques

Heiser

Lau

2019

Preprint

View full text Add to dashboard Cite

SummaryHigh-dimensional data, such as those generated using single-cell RNA sequencing, present challenges in interpretation and visualization. Numerical and computational methods for dimensionality reduction allow for low-dimensional representation of genome-scale expression data for downstream clustering, trajectory reconstruction, and biological interpretation. However, a comprehensive and quantitative evaluation of the performance of these techniques has not been established. We present an unbiased framework that defines metrics of global and local structure preservation in dimensionality reduction transformations. Using discrete and continuous scRNA-seq datasets, we find that input cell distribution and method parameters are largely determinant of global, local, and organizational data structure preservation by eleven published dimensionality reduction methods. Code available at github.com/KenLauLab/DR-structure-preservation allows for rapid evaluation of further datasets and methods.

show abstract

“…The zero-inflated negative binomial distribution is sometimes used to model single-cell RNAseq data as it can account for the abundance of zeros observed in such data [Miao et al, 2018, Eraslan et al, 2019. A random variable y is distributed zero-inflated negative binomial, denoted y ∼ ZINB(π, µ, φ), if it is generated by the following hierarchical process:…”

Section: S12 Generalizing the Poisson Assumptionmentioning

confidence: 99%

Data-based RNA-seq Simulations by Binomial Thinning

Gerard

2019

Preprint

View full text Add to dashboard Cite

With the explosion in the number of methods designed to analyze bulk and single-cell RNAseq data, there is a growing need for approaches that assess and compare these methods. The usual technique is to compare methods on data simulated according to some theoretical model. However, as real data often exhibit violations from theoretical models, this can result in unsubstantiated claims of a method's performance. Rather than generate data from a theoretical model, in this paper we develop methods to add signal to real RNA-seq datasets. Since the resulting simulated data are not generated from an unrealistic theoretical model, they exhibit realistic (annoying) attributes of real data. This lets RNA-seq methods developers assess their procedures in non-ideal (model-violating) scenarios. Our procedures may be applied to both single-cell and bulk RNA-seq. We show that our simulation method results in more realistic datasets and can alter the conclusions of a differential expression analysis study. We also demonstrate our approach by comparing various factor analysis techniques on RNA-seq datasets. Our tools are available in the seqgendiff R package on the Comprehensive R Archive Network: https://cran.r-project.org/package=seqgendiff.

show abstract

Single-cell RNA-seq denoising using a deep count autoencoder

Cited by 739 publications

References 50 publications

Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression

Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression

A quantitative framework for evaluating single-cell data structure preservation by dimensionality reduction techniques

Data-based RNA-seq Simulations by Binomial Thinning

Contact Info

Product

Resources

About