A Python library for probabilistic analysis of single-cell omics data

Gayoso, Adam; Lopez, Romain; Xing, Galen; Boyeau, Pierre; Amiri, Valeh Valiollah Pour; Hong, Justin; Wu, Katherine; Jayasuriya, Michael; Mehlman, Edouard; Langevin, Maxime; Liu, Yining; Samaran, Jules; Misrachi, Gabriel; Nazaret, Achille; Clivio, Oscar; Xu, Chenling; Ashuach, Tal; Gabitto, Mariano I.; Lotfollahi, Mohammad; Svensson, Valentine; Beltrame, Eduardo da Veiga; Kleshchevnikov, Vitalii; Talavera-López, Carlos; Pachter, Lior; Theis, Fabian J.; Streets, Aaron; Jordan, Michael I.; Regier, Jeffrey; Yosef, Nir

doi:10.1038/s41587-021-01206-w

Cited by 253 publications

(205 citation statements)

References 17 publications

Supporting

Mentioning

204

Contrasting

Order By: Relevance

“…scAR can precisely infer the native signals for protein data in CITE-seq and mRNA data in scRNAseq. Recent approaches 11,12,20,27,33,37 introduce deep learning technologies (such as AE and VAE) for these tasks and show great promise. These approaches generally design noise models based on data patterns (e.g., zero-inflation) and learn the model parameters through neural networks 11,13,33 .…”

Section: Discussionmentioning

confidence: 99%

Probabilistic machine learning ensures accurate ambient denoising in droplet-based single-cell omics

Sheng

Lopes

et al. 2022

Preprint

View full text Add to dashboard Cite

Droplet-based single-cell omics, including single-cell RNA sequencing (scRNAseq), single cell CRISPR perturbations (e.g., CROP-seq) and single-cell protein and transcriptomic profiling (e.g., CITE-seq) hold great promise for comprehensive cell profiling and genetic screening at the single cell resolution, yet these technologies suffer from substantial noise, among which ambient signals present in the cell suspension may be the predominant source. Current efforts to address this issue are highly specific to a certain technology, while a universal model to describe the noise across these technologies may reveal this common source thereby improving the denoising accuracy. To this end, we explicitly examined these unexpected signals and observed a predictable pattern in multiple datasets across different technologies. Based on the finding, we developed single cell Ambient Remover (scAR) which uses probabilistic deep learning to deconvolute the observed signals into native and ambient composition. scAR provides an efficient and universal solution to count denoising for multiple types of single-cell omics data, including single cell CRISPR screens, CITE-seq and scRNAseq. It will facilitate the application of single-cell omics technologies.

show abstract

Section: Discussionmentioning

confidence: 99%

Probabilistic machine learning ensures accurate ambient denoising in droplet-based single-cell omics

Sheng

Lopes

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The functions and the function f w are encoder and decoder functions, respectively. To be as comparable as possible to PeakVI as implemented in scvi-tools 8,18 (v.0.15.0), we use the same architecture. Specifically, these functions consist of two repeated blocks of fully connected neural networks with a fixed number of hidden dimensions set to the square root of the number of input dimensions, a dropout layer, a layer-norm layer, and leakyReLU activation.…”

Section: Methodsmentioning

confidence: 99%

Modeling fragment counts improves single-cell ATAC-seq analysis

Martens

Fischer

Theis

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Single-cell ATAC-sequencing (scATAC-seq) coverage in regulatory regions is typically binarized as an indicator of open chromatin. However, the implications of scATAC-seq data binarization have never been assessed. Here, we show that the goodness-of-fit of existing models and their applications including clustering, cell type identification, and batch integration, are improved by a quantitative treatment of the fragment counts. These results have immediate implications for scATAC-seq analysis.

show abstract

“…Amongst other data that can be optionally passed into PhyloVision are numerical and categorical metadata, as well as a two-dimensional projections of the cells for visualization purposes (e.g. from t-distributed stochastic neighbor embedding [tSNE] ( van der Maaten, 2008 ) of the main principal components or of an embedding learned by methods such as scVI ( Lopez et al., 2018 ; Gayoso et al, 2022 ). In the original VISION pipeline, cell-level clustering and consistency evaluation was performed on a user-specified “latent space” (a low-dimensional embedding such as the top principal components or an embedding inferred with tools like scVI).…”

Section: Methodsmentioning

confidence: 99%

A Python library for probabilistic analysis of single-cell omics data

Cited by 253 publications

References 17 publications

Probabilistic machine learning ensures accurate ambient denoising in droplet-based single-cell omics

Probabilistic machine learning ensures accurate ambient denoising in droplet-based single-cell omics

Modeling fragment counts improves single-cell ATAC-seq analysis

Interactive, integrated analysis of single-cell transcriptomic and phylogenetic data with PhyloVision

Contact Info

Product

Resources

About