2019
DOI: 10.1038/s41467-018-07931-2
|View full text |Cite
|
Sign up to set email alerts
|

Single-cell RNA-seq denoising using a deep count autoencoder

Abstract: Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at a cellular resolution. However, noise due to amplification and dropout may obstruct analyses, so scalable denoising methods for increasingly large but sparse scRNA-seq data are needed. We propose a deep count autoencoder network (DCA) to denoise scRNA-seq datasets. DCA takes the count distribution, overdispersion and sparsity of the data into account using a negative binomial noise model with or without zero-inflation, a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

4
435
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 739 publications
(500 citation statements)
references
References 50 publications
(67 reference statements)
4
435
1
Order By: Relevance
“…We demonstrate that a regularization step, a commmon step in bulk RNA-seq analysis [Robinson et al, 2010, McCarthy et al, 2012 where parameter estimates are pooled across genes with similar mean abundance, can effectively overcome this challenge and yield reproducible models. Importantly, many statistical and deep-learning methods designed for single cell RNA-seq data utilize a negative binomial (or zero-inflated negative binomial) error model [Lopez et al, 2018, Eraslan et al, 2019. Our results suggest that each of these methods could benefit by substituting a regularized model, and that including an additional parameter for zero-inflation could exacerbate the risk of overfitting.…”
Section: Discussionmentioning
confidence: 91%
See 1 more Smart Citation
“…We demonstrate that a regularization step, a commmon step in bulk RNA-seq analysis [Robinson et al, 2010, McCarthy et al, 2012 where parameter estimates are pooled across genes with similar mean abundance, can effectively overcome this challenge and yield reproducible models. Importantly, many statistical and deep-learning methods designed for single cell RNA-seq data utilize a negative binomial (or zero-inflated negative binomial) error model [Lopez et al, 2018, Eraslan et al, 2019. Our results suggest that each of these methods could benefit by substituting a regularized model, and that including an additional parameter for zero-inflation could exacerbate the risk of overfitting.…”
Section: Discussionmentioning
confidence: 91%
“…For example, ZINB-WaVE by Risso et al [2018] models counts as ZINB in a special variant of factor analysis. scVI and DCA also use the ZINB noise model [Lopez et al, 2018, Eraslan et al, 2019, either for normalization and dimensionality reduction in Bayesian hierarchical models, or for a denoising autoencoder. These pioneering approaches extend beyond pre-processing and normalization, but rely on the accurate estimation of per-gene error models.…”
Section: Introductionmentioning
confidence: 99%
“…Systems biologists have derived and adapted numerical and computational methods for dimensionality reduction to allow for low-dimensional representation of single-cell data and deduction of cell states and fates (Van der Maaten and Hinton, 2008;Pierson and Yau, 2015;Linderman et al, 2017;Wang et al, 2017;Becht et al, 2018;Ding, Condon and Shah, 2018;Lopez et al, 2018;Risso et al, 2018;Eraslan et al, 2019;Townes et al, 2019).…”
Section: Discussionmentioning
confidence: 99%
“…However, given the distribution and sparsity of scRNA-seq data, complex, nonlinear transformations are often required to capture and visualize expression patterns. Unsupervised machine learning techniques and, more recently, deep learning methods, are being rapidly developed to assist researchers in single-cell transcriptomic analysis (Van der Maaten and Hinton, 2008;Pierson and Yau, 2015;Linderman et al, 2017;Wang et al, 2017;Becht et al, 2018;Ding, Condon and Shah, 2018;Lopez et al, 2018;Risso et al, 2018;Eraslan et al, 2019;Townes et al, 2019). Because these techniques condense cell features in the native space to a small number of latent dimensions for visualization, lost information can result in exaggerated or dampened cell-cell similarity.…”
Section: Introductionmentioning
confidence: 99%
“…The zero-inflated negative binomial distribution is sometimes used to model single-cell RNAseq data as it can account for the abundance of zeros observed in such data [Miao et al, 2018, Eraslan et al, 2019. A random variable y is distributed zero-inflated negative binomial, denoted y ∼ ZINB(π, µ, φ), if it is generated by the following hierarchical process:…”
Section: S12 Generalizing the Poisson Assumptionmentioning
confidence: 99%