2022
DOI: 10.1101/2022.05.04.490536
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Modeling fragment counts improves single-cell ATAC-seq analysis

Abstract: Single-cell ATAC-sequencing (scATAC-seq) coverage in regulatory regions is typically binarized as an indicator of open chromatin. However, the implications of scATAC-seq data binarization have never been assessed. Here, we show that the goodness-of-fit of existing models and their applications including clustering, cell type identification, and batch integration, are improved by a quantitative treatment of the fragment counts. These results have immediate implications for scATAC-seq analysis.

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 32 publications
0
9
0
Order By: Relevance
“…In scATAC-seq data, the most common normalization strategy is binarization of peaks 136 , 140 , 141 . However, this may also remove biological information and therefore modelling of scATAC counts directly has been suggested 142 . Dimensionality reduction methods based on latent semantic indexing (ArchR 140 and Signac 143 ), latent Dirichlet allocation (cisTopic 141 ) and spectral embedding (snapATAC 136 ) were shown to perform best for downstream clustering and cell annotation 135 .…”
Section: Chromatin Accessibilitymentioning
confidence: 99%
“…In scATAC-seq data, the most common normalization strategy is binarization of peaks 136 , 140 , 141 . However, this may also remove biological information and therefore modelling of scATAC counts directly has been suggested 142 . Dimensionality reduction methods based on latent semantic indexing (ArchR 140 and Signac 143 ), latent Dirichlet allocation (cisTopic 141 ) and spectral embedding (snapATAC 136 ) were shown to perform best for downstream clustering and cell annotation 135 .…”
Section: Chromatin Accessibilitymentioning
confidence: 99%
“…We used scVI 43 to perform dimensionality reduction and batch effect removal of the spatial transcriptomics and spatial epigenomics samples (Visium, CosMx, MERFISH, and RNA+ATAC). The transcriptomics data was modeled using a zero-inflated negative binomial distribution 43 , while the chromatin accessibility data was modeled with a Poisson distribution 85 . Spatial proteomics, which quantifies the expression of tens or hundreds of proteins, has lower dimensionality, making dimensionality reduction potentially unnecessary.…”
Section: Dimensionality Reduction and Batch Effect Removalmentioning
confidence: 99%
“…We downloaded the Mouse P22 spatial RNA+ATAC dataset and independently ran dimensionality reduction on each modality using scvi-tools 93 . For ATAC-seq, we selected the fragments expressed in at least 1% of the spots and executed scVI with n_hidden = 334 and n_latent = 18, determined automatically as the square of the number of unique fragments and the square of n_hidden, respectively, n_layers = 1 and employed Poisson likelihood as it was shown to outperform binary approaches 85 . For RNA-seq, we performed CPM and log2 normalization, selected the 5000 most highly variable genes, and ran scVI with n_hidden = 334, n_latent = 18 (to maintain the same embedding size across modalities), and n_layers = 2.…”
Section: Spatial Clustering Of Multi-omics Mouse Brain Datamentioning
confidence: 99%
“…For empirical ATAC-seq data, these regions M are determined by data-dependent peak calling, where peaks are regarded as the set of candidate CREs 27,28 . As snATAC-seq can recover quantitative information on the density and distribution of nucleosomes 24,29 , we use integer values Y cm ∈ {0,1,2, … } to represent the level of accessibility. Existing pipelines diverge in the quantification of snATAC-seq counts, and we propose to use the paired insertion count (PIC) matrix as a uniform input for downstream analyses 24 .…”
Section: Resultsmentioning
confidence: 99%