Deciphering mechanisms of endocrine cell induction, specification and lineage allocation in vivo will provide valuable insights into how the islets of Langerhans are generated. Currently, it is ill defined how endocrine progenitors segregate into different endocrine subtypes during development. Here, we generated a novel neurogenin 3 (Ngn3)-Venus fusion (NVF) reporter mouse line, that closely mirrors the transient endogenous Ngn3 protein expression. To define an in vivo roadmap of endocrinogenesis, we performed single cell RNA sequencing of 36,351 pancreatic epithelial and NVF + cells during secondary transition. This allowed Ngn3 low endocrine progenitors, Ngn3 high endocrine precursors, Fev + endocrine lineage and hormone + endocrine subtypes to be distinguished and timeresolved, and molecular programs during the step-wise lineage restriction steps to be delineated. Strikingly, we identified 58 novel signature genes that show the same transient expression dynamics as Ngn3 in the 7260 profiled Ngn3-expressing cells. The differential expression of these genes in endocrine precursors associated with their cell-fate allocation towards distinct endocrine cell types. Thus, the generation of an accurately regulated NVF reporter allowed us to temporally resolve endocrine lineage development to provide a fine-grained single cell molecular profile of endocrinogenesis in vivo.
Organ- and body-scale cell atlases have the potential to transform our understanding of human biology. To capture the variability present in the population, these atlases must include diverse demographics such as age and ethnicity from both healthy and diseased individuals. The growth in both size and number of single-cell datasets, combined with recent advances in computational techniques, for the first time makes it possible to generate such comprehensive large-scale atlases through integration of multiple datasets. Here, we present the integrated Human Lung Cell Atlas (HLCA) combining 46 datasets of the human respiratory system into a single atlas spanning over 2.2 million cells from 444 individuals across health and disease. The HLCA contains a consensus re-annotation of published and newly generated datasets, resolving under- or misannotation of 59% of cells in the original datasets. The HLCA enables recovery of rare cell types, provides consensus marker genes for each cell type, and uncovers gene modules associated with demographic covariates and anatomical location within the respiratory system. To facilitate the use of the HLCA as a reference for single-cell lung research and allow rapid analysis of new data, we provide an interactive web portal to project datasets onto the HLCA. Finally, we demonstrate the value of the HLCA reference for interpreting disease-associated changes. Thus, the HLCA outlines a roadmap for the development and use of organ-scale cell atlases within the Human Cell Atlas.
Recent advances in single-cell technologies have enabled high-throughput molecular profiling of cells across modalities and locations. Single-cell transcriptomics data can now be complemented by chromatin accessibility, surface protein expression, adaptive immune receptor repertoire profiling and spatial information. The increasing availability of single-cell data across modalities has motivated the development of novel computational methods to help analysts derive biological insights. As the field grows, it becomes increasingly difficult to navigate the vast landscape of tools and analysis steps. Here, we summarize independent benchmarking studies of unimodal and multimodal single-cell analysis across modalities to suggest comprehensive best-practice workflows for the most common analysis steps. Where independent benchmarks are not available, we review and contrast popular methods. Our article serves as an entry point for novices in the field of single-cell (multi-)omic analysis and guides advanced users to the most recent best practices.
Single-cell technologies have transformed our understanding of human tissues. Yet, studies typically capture only a limited number of donors and disagree on cell type definitions. Integrating many single-cell datasets can address these limitations of individual studies and capture the variability present in the population. Here we present the integrated Human Lung Cell Atlas (HLCA), combining 49 datasets of the human respiratory system into a single atlas spanning over 2.4 million cells from 486 individuals. The HLCA presents a consensus cell type re-annotation with matching marker genes, including annotations of rare and previously undescribed cell types. Leveraging the number and diversity of individuals in the HLCA, we identify gene modules that are associated with demographic covariates such as age, sex and body mass index, as well as gene modules changing expression along the proximal-to-distal axis of the bronchial tree. Mapping new data to the HLCA enables rapid data annotation and interpretation. Using the HLCA as a reference for the study of disease, we identify shared cell states across multiple lung diseases, including SPP1+ profibrotic monocyte-derived macrophages in COVID-19, pulmonary fibrosis and lung carcinoma. Overall, the HLCA serves as an example for the development and use of large-scale, cross-dataset organ atlases within the Human Cell Atlas.
A fine-tuned balance of glucocorticoid receptor (GR) activation is essential for organ formation, with disturbances influencing health outcomes. Excess GR-activation in utero has been linked to brain-related negative outcomes, with unclear underlying mechanisms, especially regarding cell-type specific effects. To address this, we used an in vitro model of fetal human brain, induced pluripotent-stem-cell-derived cerebral organoids, and mapped GR-activation effects using single-cell transcriptomics across development. Interestingly, neurons showed targeted regulation of differentiation-and maturation-related transcripts, suggesting a delay of these processes upon GR-activation. Uniquely in neurons, differentially-expressed transcripts were significantly enriched for genes associated with behavior-related phenotypes and disorders. This suggests that aberrant GR-
Single-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classification tailored to datasets annotated at different levels of coarseness. We demonstrate the utility of sfaira by training models across anatomic data partitions on 8 million cells.
A fine-tuned balance of glucocorticoid receptor (GR) activation is essential for organ formation, with disturbances influencing health outcomes. Excess GR-activation in utero has been linked to brain-related negative outcomes, with unclear underlying mechanisms, especially regarding cell-type specific effects. To address this, we used an in vitro model of fetal human brain, induced pluripotent-stem-cell-derived cerebral organoids, and mapped GR-activation effects using single-cell transcriptomics across development. Interestingly, neurons showed targeted regulation of differentiation-and maturation-related transcripts, suggesting a delay of these processes upon GR-activation. Uniquely in neurons, differentially-expressed transcripts were significantly enriched for genes associated with behavior-related phenotypes and disorders. This suggests that aberrant GR-
Exploratory analysis of single-cell RNA-seq data sets is currently based on statistical and machine learning models that are adapted to each new data set from scratch. A typical analysis workflow includes a choice of dimensionality reduction, selection of clustering parameters, and mapping of prior annotation. These steps typically require several iterations and can take up significant time in many single-cell RNA-seq projects. Here, we introduce sfaira, which is a single-cell data and model zoo which houses data sets as well as pre-trained models. The data zoo is designed to facilitate the fast and easy contribution of data sets, interfacing to a large community of data providers. Sfaira currently includes 233 data sets across 45 organs and 3.1 million cells in both human and mouse. Using these data sets we have trained eight different example model classes, such as autoencoders and logistic cell type predictors: The infrastructure of sfaira is model agnostic and allows training und usage of many previously published models. Sfaira directly aids in exploratory data analysis by replacing embedding and cell type annotation workflows with end-to-end pre-trained parametric models. As further example use cases for sfaira, we demonstrate the extraction of gene-centric data statistics across many tissues, improved usage of cell type labels at different levels of coarseness, and an application for learning interpretable models through data regularization on extremely diverse data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.