Recent technical developments have enabled the transcriptomes of hundreds of cells to be assayed in an unbiased manner, opening up the possibility that new subpopulations of cells can be found. However, the effects of potential confounding factors, such as the cell cycle, on the heterogeneity of gene expression and therefore on the ability to robustly identify subpopulations remain unclear. We present and validate a computational approach that uses latent variable models to account for such hidden factors. We show that our single-cell latent variable model (scLVM) allows the identification of otherwise undetectable subpopulations of cells that correspond to different stages during the differentiation of naive T cells into T helper 2 cells. Our approach can be used not only to identify cellular subpopulations but also to tease apart different sources of gene expression heterogeneity in single-cell transcriptomes.Single-cell measurements of gene expression, using imaging techniques such as RNA-FiSH (fluorescence in situ hybridization), have provided important insights into the kinetics of transcription and cell-to-cell variation in gene expression [1][2][3] . However, such approaches can examine the expression of only a small number of genes in each experiment, thus restricting our ability to examine co-expression patterns and to robustly identify subpopulations of cells. Protocols have been developed to overcome these limitations by amplifying small quantities of mRNA 4,5 , which, in combination with microfluidics approaches for isolating individual cells 6,7 , have been used to analyze the co-expression of tens to hundreds of genes in single cells 8,9 . These protocols also allow the entire transcriptome of large numbers of single cells to be assayed in an unbiased way. This was initially done using microarrays 10,11 but is more often now done using next-generation sequencing [12][13][14][15] . Such approaches have been used to model early embryogenesis in the mouse 16 and to investigate bimodality in gene expression patterns of differentiating immune cell types 17 .After the generation of single-cell RNA-sequencing (RNA-seq) profiles from hundreds of cells, one goal to identify subpopulations that share a common gene-expression profile. Some of these subpopulations may represent previously unidentified cell types. Additionally, by studying patterns of gene expression in different single cells, insights into the regulatory landscape of each cell population can be obtained.However, methods for identifying subpopulations of cells and modeling their gene regulatory landscapes are only now beginning to emerge 18,19 . To fully exploit single-cell RNA-seq data, we have to account for the random noise inherent to such data sets 20 and, equally important, to account for different hidden factors that might result in gene expression heterogeneity. Although the importance of accounting for unobserved factors is well established in bulk RNA-seq studies [21][22][23] , robust approaches to detect and account for confounding f...
The temporal order of differentiating cells is intrinsically encoded in their single-cell expression profiles. We describe an efficient way to robustly estimate this order according to diffusion pseudotime (DPT), which measures transitions between cells using diffusion-like random walks. Our DPT software implementations make it possible to reconstruct the developmental progression of cells and identify transient or metastable states, branching decisions and differentiation endpoints.
Multi‐omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi‐Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi‐omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and ex vivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy‐chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single‐cell multi‐omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.
Supplementary data are available at Bioinformatics online.
SummaryHeterogeneity within the self-renewal durability of adult hematopoietic stem cells (HSCs) challenges our understanding of the molecular framework underlying HSC function. Gene expression studies have been hampered by the presence of multiple HSC subtypes and contaminating non-HSCs in bulk HSC populations. To gain deeper insight into the gene expression program of murine HSCs, we combined single-cell functional assays with flow cytometric index sorting and single-cell gene expression assays. Through bioinformatic integration of these datasets, we designed an unbiased sorting strategy that separates non-HSCs away from HSCs, and single-cell transplantation experiments using the enriched population were combined with RNA-seq data to identify key molecules that associate with long-term durable self-renewal, producing a single-cell molecular dataset that is linked to functional stem cell activity. Finally, we demonstrated the broader applicability of this approach for linking key molecules with defined cellular functions in another stem cell system.
Here we report the use of diffusion maps and network synthesis from state transition graphs to better understand developmental pathways from single cell gene expression profiling. We map the progression of mesoderm towards blood in the mouse by single-cell expression analysis of 3,934 cells, capturing cells with blood-forming potential at four sequential developmental stages. By adapting the diffusion plot methodology for dimensionality reduction to single-cell data, we reconstruct the developmental journey to blood at single-cell resolution. Using transitions between individual cellular states as input, we develop a single-cell network synthesis toolkit to generate a computationally executable transcriptional regulatory network model that recapitulates blood development. Model predictions were validated by showing that Sox7 inhibits primitive erythropoiesis, and that Sox and Hox factors control early expression of Erg. We therefore demonstrate that single-cell analysis of a developing organ coupled with computational approaches can reveal the transcriptional programs that control organogenesis.
The transcriptome of single cells can reveal important information about cellular states and heterogeneity within populations of cells. Recently, single-cell RNA-sequencing has facilitated expression profiling of large numbers of single cells in parallel. To fully exploit these data, it is critical that suitable computational approaches are developed. One key challenge, especially pertinent when considering dividing populations of cells, is to understand the cell-cycle stage of each captured cell. Here we describe and compare five established supervised machine learning methods and a custom-built predictor for allocating cells to their cell-cycle stage on the basis of their transcriptome. In particular, we assess the impact of different normalisation strategies and the usage of prior knowledge on the predictive power of the classifiers. We tested the methods on previously published datasets and found that a PCA-based approach and the custom predictor performed best. Moreover, our analysis shows that the performance depends strongly on normalisation and the usage of prior knowledge. Only by leveraging prior knowledge in form of cell-cycle annotated genes and by preprocessing the data using a rank-based normalisation, is it possible to robustly capture the transcriptional cell-cycle signature across different cell types, organisms and experimental protocols.
Formation of the three primary germ layers during gastrulation is an essential step in the establishment of the vertebrate body plan and is associated with major transcriptional changes [1][2][3][4][5] . Global epigenetic reprogramming accompanies these changes [6][7][8] , but the role of the epigenome in regulating early cell fate choice remains unresolved, and the coordination between different molecular layers is unclear. Here we describe the first single cell triple-omics map of chromatin accessibility, DNA methylation and RNA expression during the onset of gastrulation in mouse embryos. The initial exit from pluripotency coincides with the establishment of a global repressive epigenetic landscape, followed by the emergence of lineage-specific epigenetic patterns during gastrulation. Notably, cells committed to mesoderm and endoderm undergo widespread coordinated epigenetic rearrangements at enhancer marks, driven by TET-mediated demethylation, and a concomitant increase of accessibility. In striking contrast, the methylation and accessibility landscape of ectodermal cells is already established in the early epiblast. Hence, regulatory elements associated with each germ layer are either epigenetically primed or remodelled prior to cell fate decisions, providing the molecular logic for a hierarchical emergence of the primary germ layers.Recent technological advances have enabled the profiling of multiple molecular layers at single cell resolution 9-13 , providing novel opportunities to study the relationship between the transcriptome and epigenome during cell fate decisions. We applied scNMT-seq (singlecell Nucleosome, Methylome and Transcriptome sequencing 12 ) to profile 1,105 single cells isolated from mouse embryos at four developmental stages (Embryonic Day (E) 4.5, E5.5, E6.5 and E7.5) which comprise the exit from pluripotency and primary germ layer specification (Figure 1a-d, Extended Data Fig. 1). Cells were assigned to a specific lineage by mapping their RNA expression profiles to a comprehensive single-cell atlas 4 from the same stages, when available, or using marker genes (Extended Data Fig. 2). By performing Argelaguet et al.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.