EpiScanpy: integrated single-cell epigenomic analysis

Danese, Anna; Richter, Maria L.; Chaichoompu, Kridsadakorn; Fischer, David; Theis, Fabian J.; Colomé-Tatché, Maria

doi:10.1038/s41467-021-25131-3

Cited by 79 publications

(67 citation statements)

References 37 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To determine whether scTriangulate multimodal integrated results: (1) correspond to well-described cell states, (2) have improved accuracy over alternative approaches, and (3) reveal new discrete cell populations, we first applied it to several independent human immune single-cell datasets assayed with four distinct approaches: scRNA-Seq (RNA), CITE-Seq (ADT + RNA), multiome (ATAC + RNA) and TEA-Seq (ADT + ATAC + RNA). For the analysis of snATAC-Seq, scTriangulate adopts a modified version of epiScanpy 17 to collect peak-level information for the ATAC cell clusters. For ADTs, scTriangulate uses Centered Log Ratio (CLR) normalization 1 .…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Decision level integration of unimodal and multimodal single cell data with scTriangulate

Song

Singh

et al. 2023

Nat Commun

View full text Add to dashboard Cite

Decisively delineating cell identities from uni- and multimodal single-cell datasets is complicated by diverse modalities, clustering methods, and reference atlases. We describe scTriangulate, a computational framework to mix-and-match multiple clustering results, modalities, associated algorithms, and resolutions to achieve an optimal solution. Rather than ensemble approaches which select the “consensus”, scTriangulate picks the most stable solution through coalitional iteration. When evaluated on diverse multimodal technologies, scTriangulate outperforms alternative approaches to identify high-confidence cell-populations and modality-specific subtypes. Unlike existing integration strategies that rely on modality-specific joint embedding or geometric graphs, scTriangulate makes no assumption about the distributions of raw underlying values. As a result, this approach can solve unprecedented integration challenges, including the ability to automate reference cell-atlas construction, resolve clonal architecture within molecularly defined cell-populations and subdivide clusters to discover splicing-defined disease subtypes. scTriangulate is a flexible strategy for unified integration of single-cell or multimodal clustering solutions, from nearly unlimited sources.

show abstract

Section: Resultsmentioning

confidence: 99%

“…We conducted QC based on both RNA and ATAC peaks. We filtered out nuclei with min_genes < 300, min_counts < 500, pct_counts_mt > 20% for RNA data, together with the additional criteria for at least 1000 peaks/nucleus in the ATAC data based on episcanpy 17 tutorial. Taken together, 10,991 nuclei were kept for further analysis.…”

Section: Methodsmentioning

confidence: 99%

Decision level integration of unimodal and multimodal single cell data with scTriangulate

Song

Singh

et al. 2023

Nat Commun

View full text Add to dashboard Cite

show abstract

“…Louvain and Leiden, which require a resolution parameter but not the number of clusters. To obtain the desired number of clusters, a binary search strategy is usually adopted ( Chen et al , 2019 , 2021 ; Danese et al , 2021 ). However, each attempt in the search process is time-consuming, especially for large data.…”

Section: Methodsmentioning

confidence: 99%

“…Although almost all the widely-used scCAS data analysis workflows, e.g. Signac ( Stuart et al , 2021 ), ArchR ( Granja et al , 2021 ) and EpiScanpy ( Danese et al , 2021 ), adopted community detection-based techniques to find the best possible grouping, the estimation of the number of cell types in scCAS data is still typically subjective and largely relied on the investigator’s desired clustering resolution and/or prior knowledge ( Supplementary Text S2 ).…”

Section: Introductionmentioning

confidence: 99%

ASTER: accurately estimating the number of cell types in single-cell chromatin accessibility data

et al. 2022

View full text Add to dashboard Cite

Summary Recent innovations in single-cell chromatin accessibility sequencing (scCAS) have revolutionized the characterization of epigenomic heterogeneity. Estimation of the number of cell types is a crucial step for downstream analyses and biological implications. However, efforts to perform estimation specifically for scCAS data are limited. Here we propose ASTER, an ensemble learning-based tool for accurately estimating the number of cell types in scCAS data. ASTER outperformed baseline methods in systematic evaluation on 27 datasets of various protocols, sizes, numbers of cell types, degrees of cell-type imbalance, cell states, and qualities, providing valuable guidance for scCAS data analysis. Availability and implementation ASTER along with detailed documentation is freely accessible at https://aster.readthedocs.io/ under the MIT License. It can be seamlessly integrated into existing scCAS analysis workflows. The source code is available at https://github.com/biox-nku/aster. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

“…In short, the RNA-seq data is preprocessed as detailed in Section 4.7.1, with the additional filtering of cells with > 25000 or < 1000 counts and < 20% mitocondrial counts of total. For the ATACseq data we used epiScanpy [73], filtereing peaks in < 10 cells and cells with < 5000 or > 7 • 10 4 counts, and with a variability score < 0.515. Final data contains 10411 cells, 21601 genes and 75111 peaks, Cell types are annotated using the reference PBMC dataset [58] passed to scanpy's [74] inject label transfer function, resulting in 8 annotated celltypes (Figure 5A).…”

Section: Pbmc Multi-ome Dataset Prepocessingmentioning

confidence: 99%

Structure Primed Embedding on the Transcription Factor Manifold Enables Transparent Model Architectures for Gene Regulatory Network and Latent Activity Inference

Tjärnberg

Beheler-Amass

Jackson

et al. 2023

Preprint

View full text Add to dashboard Cite

The modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of regulatory features in genome-wide screens. Most GRN inference methods are therefore forced to model relationships between regulatory genes and their targets with expression as a proxy for the upstream independent features, complicating validation and predictions produced by modeling frameworks. Separating covariance and regulatory influence requires aggregation of independent and complementary sets of evidence, such as transcription factor (TF) binding and target gene expression. However, the complete regulatory state of the system, e.g. TF activity (TFA) is unknown due to a lack of experimental feasibility, making regulatory relations difficult to infer. Some methods attempt to account for this by modeling TFA as a latent feature, but these models often use linear frameworks that are unable to account for non-linearities such as saturation, TF-TF interactions, and other higher order features. Deep learning frameworks may offer a solution, as they are capable of modeling complex interactions and capturing higher-order latent features. However, these methods often discard central concepts in biological systems modeling, such as sparsity and latent feature interpretability, in favor of increased model complexity. We propose a novel deep learning autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor), that scales to single cell genomic data and maintains interpretability to perform GRN inference and estimate TFA as a latent feature. We demonstrate that SupirFactor outperforms current leading GRN inference methods, predicts biologically relevant TFA and elucidates functional regulatory pathways through aggregation of TFs.

show abstract

EpiScanpy: integrated single-cell epigenomic analysis

Cited by 79 publications

References 37 publications

Decision level integration of unimodal and multimodal single cell data with scTriangulate

Decision level integration of unimodal and multimodal single cell data with scTriangulate

ASTER: accurately estimating the number of cell types in single-cell chromatin accessibility data

Structure Primed Embedding on the Transcription Factor Manifold Enables Transparent Model Architectures for Gene Regulatory Network and Latent Activity Inference

Contact Info

Product

Resources

About