Organ- and body-scale cell atlases have the potential to transform our understanding of human biology. To capture the variability present in the population, these atlases must include diverse demographics such as age and ethnicity from both healthy and diseased individuals. The growth in both size and number of single-cell datasets, combined with recent advances in computational techniques, for the first time makes it possible to generate such comprehensive large-scale atlases through integration of multiple datasets. Here, we present the integrated Human Lung Cell Atlas (HLCA) combining 46 datasets of the human respiratory system into a single atlas spanning over 2.2 million cells from 444 individuals across health and disease. The HLCA contains a consensus re-annotation of published and newly generated datasets, resolving under- or misannotation of 59% of cells in the original datasets. The HLCA enables recovery of rare cell types, provides consensus marker genes for each cell type, and uncovers gene modules associated with demographic covariates and anatomical location within the respiratory system. To facilitate the use of the HLCA as a reference for single-cell lung research and allow rapid analysis of new data, we provide an interactive web portal to project datasets onto the HLCA. Finally, we demonstrate the value of the HLCA reference for interpreting disease-associated changes. Thus, the HLCA outlines a roadmap for the development and use of organ-scale cell atlases within the Human Cell Atlas.
Recent advances in single-cell technologies have enabled high-throughput molecular profiling of cells across modalities and locations. Single-cell transcriptomics data can now be complemented by chromatin accessibility, surface protein expression, adaptive immune receptor repertoire profiling and spatial information. The increasing availability of single-cell data across modalities has motivated the development of novel computational methods to help analysts derive biological insights. As the field grows, it becomes increasingly difficult to navigate the vast landscape of tools and analysis steps. Here, we summarize independent benchmarking studies of unimodal and multimodal single-cell analysis across modalities to suggest comprehensive best-practice workflows for the most common analysis steps. Where independent benchmarks are not available, we review and contrast popular methods. Our article serves as an entry point for novices in the field of single-cell (multi-)omic analysis and guides advanced users to the most recent best practices.
Single-cell technologies have transformed our understanding of human tissues. Yet, studies typically capture only a limited number of donors and disagree on cell type definitions. Integrating many single-cell datasets can address these limitations of individual studies and capture the variability present in the population. Here we present the integrated Human Lung Cell Atlas (HLCA), combining 49 datasets of the human respiratory system into a single atlas spanning over 2.4 million cells from 486 individuals. The HLCA presents a consensus cell type re-annotation with matching marker genes, including annotations of rare and previously undescribed cell types. Leveraging the number and diversity of individuals in the HLCA, we identify gene modules that are associated with demographic covariates such as age, sex and body mass index, as well as gene modules changing expression along the proximal-to-distal axis of the bronchial tree. Mapping new data to the HLCA enables rapid data annotation and interpretation. Using the HLCA as a reference for the study of disease, we identify shared cell states across multiple lung diseases, including SPP1+ profibrotic monocyte-derived macrophages in COVID-19, pulmonary fibrosis and lung carcinoma. Overall, the HLCA serves as an example for the development and use of large-scale, cross-dataset organ atlases within the Human Cell Atlas.
Personalized multipeptide vaccines are currently being discussed intensively for tumor immunotherapy. In order to identify epitopesshort, immunogenic peptidessuitable for eliciting a tumor-specific immune response, human leukocyte antigen-presented peptides are isolated by immunoaffinity purification from cancer tissue samples and analyzed by liquid chromatographycoupled tandem mass spectrometry (LC−MS/MS). Here, we present MHCquant, a fully automated, portable computational pipeline able to process LC−MS/MS data automatically and generate annotated, false discovery rate-controlled lists of (neo-)epitopes with associated relative quantification information. We could show that MHCquant achieves higher sensitivity than established methods. While obtaining the highest number of unique peptides, the rate of predicted MHC binders remains still comparable to other tools. Reprocessing of the data from a previously published study resulted in the identification of several neoepitopes not detected by previously applied methods. MHCquant integrates tailor-made pipeline components with existing open-source software into a coherent processing workflow. Container-based virtualization permits execution of this workflow without complex software installation, execution on cluster/cloud infrastructures, and full reproducibility of the results. Integration with the data analysis workbench KNIME enables easy mining of large-scale immunopeptidomics data sets. MHCquant is available as open-source software along with accompanying documentation on our website at https://www. openms.de/mhcquant/.
β-hemoglobinopathies are caused by abnormal or absent production of hemoglobin in the blood due to mutations in the β-globin gene (HBB). Imbalanced expression of adult hemoglobin (HbA) induces strong anemia in patients suffering from the disease. However, individuals with natural-occurring mutations in the HBB cluster or related genes, compensate this disparity through γ-globin expression and subsequent fetal hemoglobin (HbF) production. Several preclinical and clinical studies have been performed in order to induce HbF by knocking-down genes involved in HbF repression (KLF1 and BCL11A) or disrupting the binding sites of several transcription factors in the γ-globin gene (HBG1/2). In this study, we thoroughly compared the different CRISPR/Cas9 gene-disruption strategies by gene editing analysis and assessed their safety profile by RNA-seq and GUIDE-seq. All approaches reached therapeutic levels of HbF after gene editing and showed similar gene expression to the control sample, while no significant off-targets were detected by GUIDE-seq. Likewise, all three gene editing platforms were established in the GMP-grade CliniMACS Prodigy, achieving similar outcome to preclinical devices. Based on this gene editing comparative analysis, we concluded that BCL11A is the most clinically relevant approach while HBG1/2 could represent a promising alternative for the treatment of β-hemoglobinopathies. Sickle cell disease (SCD) and β-thalassemia, commonly known as β-hemoglobinopathies, are inherited blood disorders caused by mutations in the human β-globin gene (HBB) 1-4. In healthy condition, adult human hemoglobin (HbA) consists of 2 α and 2 β chains, whereas fetal hemoglobin (HbF) expressed in early gestation comprises 2 α chains and 2 γ chains. Notably, HbF was observed to bind oxygen with greater affinity than HbA, being functional when reactivated in adults 3,5,6. Recent studies have generated substantial experimental evidence that HbF reactivation by gene disruption of specific transcription factors and regulators could provide a therapeutic benefit for β-hemoglobinopathies 7. It has long been appreciated that KLF1 and BCL11A are key regulators involved in the process of γto β-globin switching and the repression of these genes leads to HbF resurgence 6-11. Interestingly, healthy individuals with a benign genetic condition namely hereditary persistence of fetal hemoglobin (HPFH) were observed to exhibit persistent production of functional HbF 4,10,12,13. HPFH is caused by large deletions in the δand β-globin genes, or point mutations in the γ-globin promoter and γ-globin repressors, such as KLF1 and BCL11A 5. Importantly,
We present the AIMe registry, a community-driven reporting platform for AI in biomedicine. It aims to enhance the accessibility, reproducibility and usability of biomedical AI models, and allows future revisions by the community.
Single-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classification tailored to datasets annotated at different levels of coarseness. We demonstrate the utility of sfaira by training models across anatomic data partitions on 8 million cells.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.