Background: Recent innovations in single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) enable profiling of the epigenetic landscape of thousands of individual cells. scATAC-seq data analysis presents unique methodological challenges. scATAC-seq experiments sample DNA, which, due to low copy numbers (diploid in humans), lead to inherent data sparsity (1-10% of peaks detected per cell) compared to transcriptomic (scRNA-seq) data (10-45% of expressed genes detected per cell). Such challenges in data generation emphasize the need for informative features to assess cell heterogeneity at the chromatin level. Results: We present a benchmarking framework that is applied to 10 computational methods for scATAC-seq on 13 synthetic and real datasets from different assays, profiling cell types from diverse tissues and organisms. Methods for processing and featurizing scATAC-seq data were compared by their ability to discriminate cell types when combined with common unsupervised clustering approaches. We rank evaluated methods and discuss computational challenges associated with scATAC-seq analysis including inherently sparse data, determination of features, peak calling, the effects of sequencing coverage and noise, and clustering performance. Running times and memory requirements are also discussed. Conclusions: This reference summary of scATAC-seq methods offers recommendations for best practices with consideration for both the non-expert user and the methods developer. Despite variation across methods and datasets, SnapATAC, Cusanovich2018, and cisTopic outperform other methods in separating cell populations of different coverages and noise levels in both synthetic and real datasets. Notably, SnapATAC is the only method able to analyze a large dataset (> 80,000 cells).
21 22 Background 23 Recent innovations in single-cell Assay for Transposase Accessible Chromatin using 24 sequencing (scATAC-seq) enable profiling of the epigenetic landscape of thousands of 25 individual cells. scATAC-seq data analysis presents unique methodological challenges. 26 scATAC-seq experiments sample DNA, which, due to low copy numbers (diploid in 27 humans) lead to inherent data sparsity (1-10% of peaks detected per cell) compared to 28 transcriptomic (scRNA-seq) data (20-50% of expressed genes detected per cell). Such 29 challenges in data generation emphasize the need for informative features to assess cell 30 heterogeneity at the chromatin level. 31 2 32Results 33We present a benchmarking framework that was applied to 10 computational methods 34for scATAC-seq on 13 synthetic and real datasets from different assays, profiling cell 35 types from diverse tissues and organisms. Methods for processing and featurizing 36 scATAC-seq data were evaluated by their ability to discriminate cell types when 37 combined with common unsupervised clustering approaches. We rank evaluated 38 methods and discuss computational challenges associated with scATAC-seq analysis 39including inherently sparse data, determination of features, peak calling, the effects of 40 sequencing coverage and noise, and clustering performance. Running times and 41 memory requirements are also discussed. 42 43Conclusions 44This reference summary of scATAC-seq methods offers recommendations for best 45 practices with consideration for both the non-expert user and the methods developer. 46Despite variation across methods and datasets, SnapATAC, Cusanovich2018, and 47 cisTopic outperform other methods in separating cell populations of different coverages 48 and noise levels in both synthetic and real datasets. Notably, SnapATAC was the only 49 method able to analyze a large dataset (> 80,000 cells). 50 51
Recent advances in single cell omics technologies enable the individual or joint profiling of cellular measurements including gene expression, epigenetic features, chromatin structure and DNA sequences. Currently, most single-cell analysis pipelines are cluster-centric, i.e., they first cluster cells into non-overlapping cellular states and then extract their defining genomic features. These approaches assume that discrete clusters correspond to biologically relevant subpopulations and do not explicitly model the interactions between different feature types. However, cellular processes are defined in individual cells and inherently involve multiple genomic features that interact with each other and together provide complementary views on principles of gene regulation. In addition, single-cell methods are generally designed for a particular task as distinct single-cell problems are formulated differently. To address these current shortcomings, we present SIMBA, a single-cell embedding method that embeds single cells along with their defining features, such as genes, chromatin accessible regions, and transcription factor binding sequences, into a common latent space. By leveraging the co-embedding of cells and features, SIMBA allows for cellular heterogeneity study, clustering-free marker discovery, gene regulation inference, batch effect removal, and omics data integration. SIMBA has been extensively applied to scRNA-seq, scATAC-seq, and dual-omics data. We show that SIMBA provides a single framework that allows diverse single-cell analysis problems to be formulated in a common way and thus simplifies the development of new analyses and integration of other single-cell modalities.
Single-cell assays have transformed our ability to model heterogeneity within cell populations. As these assays have advanced in their ability to measure various aspects of molecular processes in cells, computational methods to analyze and meaningfully visualize such data have required matched innovation. Independently, Virtual Reality (VR) has recently emerged as a powerful technology to dynamically explore complex data and shows promise for adaptation to challenges in single-cell data visualization. However, adopting VR for single-cell data visualization has thus far been hindered by expensive prerequisite hardware or advanced data preprocessing skills. To address current shortcomings, we present singlecellVR, a user-friendly web application for visualizing single-cell data, designed for cheap and easily available virtual reality hardware (e.g., Google Cardboard, ∼$8). singlecellVR can visualize data from a variety of sequencing-based technologies including transcriptomic, epigenomic, and proteomic data as well as combinations thereof. Analysis modalities supported include approaches to clustering as well as trajectory inference and visualization of dynamical changes discovered through modelling RNA velocity. We provide a companion software package, scvr to streamline data conversion from the most widely-adopted single-cell analysis tools as well as a growing database of pre-analyzed datasets to which users can contribute.
Most current single-cell analysis pipelines are limited to cell embeddings and rely heavily on clustering, while lacking the ability to explicitly model interactions between different feature types. Furthermore, these methods are tailored to specific tasks, as distinct single-cell problems are formulated differently. To address these shortcomings, here we present SIMBA, a graph embedding method that jointly embeds single cells and their defining features, such as genes, chromatin-accessible regions and DNA sequences, into a common latent space. By leveraging the co-embedding of cells and features, SIMBA allows for the study of cellular heterogeneity, clustering-free marker discovery, gene regulation inference, batch effect removal and omics data integration. We show that SIMBA provides a single framework that allows diverse single-cell problems to be formulated in a unified way and thus simplifies the development of new analyses and extension to new single-cell modalities. SIMBA is implemented as a comprehensive Python library (https://simba-bio.readthedocs.io).
Although vast numbers of putative gene regulatory elements have been cataloged, the sequence motifs and individual bases that underlie their functions remain largely unknown. Here we combine deep learning, epigenetic perturbations and base editing to dissect regulatory sequences within the exemplar immune locus encoding CD69. Focusing on a differentially accessible and acetylated upstream enhancer, we find that the complementary strategies converge on a ~150 base interval as critical for CD69 induction in stimulated Jurkat T cells. We pinpoint individual cytosine to thymine base edits that markedly reduce element accessibility and acetylation, with corresponding reduction of CD69 expression. The most potent base edits may be explained by their effect on binding competition between the transcriptional activator GATA3 and the repressor BHLHE40. Systematic analysis of GATA and bHLH/Ebox motifs suggests that interplay between these factors plays a general role in rapid T cell transcriptional responses. Our study provides a framework for parsing gene regulatory elements in their endogenous chromatin contexts and identifying operative engineered variants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.