Highlights d Seurat v3 identifies correspondences between cells in different experiments d These ''anchors'' can be used to harmonize datasets into a single reference d Reference labels and data can be projected onto query datasets d Extends beyond RNA-seq to single-cell protein, chromatin, and spatial data
Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to "anchor" diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets.Availability: Installation instructions, documentation, and tutorials are available at: https://www.satijalab.org/seurat effective, they can also struggle in cases where only a subset of cell types are shared across datasets, or significant technical variation masks shared biological signal. Additionally, these methods focus on scRNA-seq and are not designed to integrate information across different modalities, nor do they enable the transfer of information from one dataset to another.Here, we present a unified strategy for reference assembly and transfer learning for transcriptomic, epigenomic, proteomic, and spatially-resolved single cell data. Through the identification of cell pairwise correspondences between single cells across datasets, termed "anchors", we can transform datasets into a shared space, even in the presence of extensive technical and/or biological differences. This enables the construction of harmonized atlases at the tissue or organismal scale. These anchors also enable effective transfer of discrete or continuous data from a reference onto a query dataset. This allows for the transfer of cell labels learned from scRNA-seq onto scATAC-seq data to explore differences in the regulatory landscape between distinct interneuron subsets, and the transfer of protein measurements 3 onto massive public resources to characterize lymphoid populations in human bone marrow. Finally, the anchoring of STARmap and scRNA-seq datasets enables the transcriptome-wide imputation of spatial gene expression pattern...
Recent high-throughput single-cell sequencing approaches have been transformative for understanding complex cell populations, but are unable to provide additional phenotypic information, such as protein levels of cell-surface markers. Using oligonucleotide-labeled antibodies, we integrate measurements of cellular proteins and transcriptomes into an efficient, sequencing-based readout of single cells. This method is compatible with existing single-cell sequencing approaches and will readily scale as the throughput of these methods increase.
Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from "regularized negative binomial regression," where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat.
Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from 'regularized negative binomial regression', where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation, and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat.
Organisms from all domains of life use gene regulation networks to control cell growth, identity, function, and responses to environmental challenges. Although accurate global regulatory models would provide critical evolutionary and functional insights, they remain incomplete, even for the best studied organisms. Efforts to build comprehensive networks are confounded by challenges including network scale, degree of connectivity, complexity of organism–environment interactions, and difficulty of estimating the activity of regulatory factors. Taking advantage of the large number of known regulatory interactions in Bacillus subtilis and two transcriptomics datasets (including one with 38 separate experiments collected specifically for this study), we use a new combination of network component analysis and model selection to simultaneously estimate transcription factor activities and learn a substantially expanded transcriptional regulatory network for this bacterium. In total, we predict 2,258 novel regulatory interactions and recall 74% of the previously known interactions. We obtained experimental support for 391 (out of 635 evaluated) novel regulatory edges (62% accuracy), thus significantly increasing our understanding of various cell processes, such as spore formation.
Summary Diverse subsets of cortical interneurons play vital roles in higher-order brain functions. To investigate how this diversity is generated, we used single cell RNA-seq to profile the transcriptomes of murine cells collected along a developmental timecourse. Heterogeneity within mitotic progenitors in the ganglionic eminences is driven by a highly conserved maturation trajectory, alongside eminence-specific transcription factor expression that seeds the emergence of later diversity. Upon becoming postmitotic, progenitors diverge and differentiate into transcriptionally distinct states, including an interneuron precursor state. By integrating datasets across developmental timepoints, we identified shared sources of transcriptomic heterogeneity between adult interneurons and their precursors, revealing the embryonic emergence of interneuron cardinal subtypes. Our analysis revealed that the ASD-associated transcription factor Mef2c delineates early Pvalb-precursors, and is essential for their development. These findings shed new light on the molecular diversification of early inhibitory precursors, and identify gene modules that may influence the specification of human subtypes.
Diverse subsets of cortical interneurons play a particularly important role in the stability of the neural circuits underlying cognitive and higher order brain functions, yet our understanding of how this diversity is generated is far from complete. We applied massively parallel single-cell RNA-seq to profile a developmental time course of interneuron development, measuring the transcriptomes of over 60,000 progenitors during their maturation in the ganglionic eminences and embryonic migration into the cortex. While diversity within mitotic progenitors is largely driven by cell cycle and differentiation state, we observed sparse eminence-specific transcription factor expression, which seeds the emergence of later cell diversity. Upon becoming postmitotic, cells from all eminences pass through one of three precursor states, one of which represents a cortical interneuron ground state. By integrating datasets across developmental timepoints, we identified transcriptomic heterogeneity in interneuron precursors representing the emergence of four cardinal classes (Pvalb, Sst, Id2 and Vip), which further separate into subtypes at different timepoints during development. Our analysis revealed that the ASD-associated transcription factor Mef2c discriminates early Pvalb-precursors in E13.5 cells, and removal of Mef2c confirms its essential role for Pvalb interneuron development. These findings shed new light on the molecular diversification of early inhibitory precursors, and suggest gene modules that may link developmental specification with the etiology of neuropsychiatric disorders.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.