Evaluating the Utilities of Foundation Models in Single-cell Data Analysis

Liu, Tianyu; Li, Kexing; Wang, Yuge; Li, Hongyu; Zhao, Hongyu

doi:10.1101/2023.09.08.555192

Cited by 8 publications

(2 citation statements)

References 144 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…JOINTLY performs on par with state-of-the-art batch integration tools, such as scVI 3 and Harmony 2 , in clustering tasks and has a similar trade-off between biological heterogeneity and batch mixing as scVI. In line with a recent benchmark 47 , we found that JOINTLY and several task-specific models, outperformed scGPT, a foundational single-cell RNA-sequencing model. As a future perspective, we envision that the performance of JOINTLY can be even further improved by initialising the algorithm using cell type labels.…”

Section: Discussionsupporting

confidence: 85%

JOINTLY: interpretable joint clustering of single-cell transcriptomes

Møller,

Madsen

2023

Nat Commun

View full text Add to dashboard Cite

Single-cell and single-nucleus RNA-sequencing (sxRNA-seq) is increasingly being used to characterise the transcriptomic state of cell types at homeostasis, during development and in disease. However, this is a challenging task, as biological effects can be masked by technical variation. Here, we present JOINTLY, an algorithm enabling joint clustering of sxRNA-seq datasets across batches. JOINTLY performs on par or better than state-of-the-art batch integration methods in clustering tasks and outperforms other intrinsically interpretable methods. We demonstrate that JOINTLY is robust against over-correction while retaining subtle cell state differences between biological conditions and highlight how the interpretation of JOINTLY can be used to annotate cell types and identify active signalling programs across cell types and pseudo-time. Finally, we use JOINTLY to construct a reference atlas of white adipose tissue (WATLAS), an expandable and comprehensive community resource, in which we describe four adipocyte subpopulations and map compositional changes in obesity and between depots.

show abstract

Section: Discussionsupporting

confidence: 85%

JOINTLY: interpretable joint clustering of single-cell transcriptomes

Møller,

Madsen

2023

Nat Commun

View full text Add to dashboard Cite

show abstract

“…The diversity and complexity of these tasks are useful to thoroughly probe the model's performance and to evaluate the robustness of the learned representation and the model's ability to generalize to complex predictive tasks. Current results are promising but not entirely replicated in independent benchmarks [45][46][47][48][49][50] . Notably, to date, none of these models account for spatial relationships of cells during training, with the exception of CellPLM 40 , which, however, is trained on a limited dataset of 9 million dissociated and 2 million spatial transcriptomics cells 40 and not fine-tuned on spatial tasks beyond gene imputation.We propose Nicheformer, a novel spatial omics foundation model to understand tissue dependencies.…”

mentioning

confidence: 99%

Nicheformer: a foundation model for single-cell and spatial omics

Schaar,

Tejada-Lapuerta,

Palla

et al. 2024

Preprint

View full text Add to dashboard Cite

Tissue makeup and the corresponding orchestration of vital biological activities, ranging from development and differentiation to immune response and regeneration, rely fundamentally on the cellular microenvironment and the interactions between cells. Spatial single-cell genomics allows probing such interactions in an unbiased and, increasingly, scalable fashion. To learn a unified cell representation that accounts for local dependencies in the cellular microenvironment and the underlying cell interactions, we propose to generalize recent foundation modeling approaches for disassociated single-cell transcriptomics to the spatial omics setting. Our model, Nicheformer, is a transformer-based foundation model that combines human and mouse dissociated single-cell and targeted spatial transcriptomics data to learn a cellular representation useful for a large variety of downstream tasks. Nicheformer is pretrained on over 57 million dissociated and 53 million spatially resolved cells across 73 tissues from both human and mouse. Subsequently, the model is fine-tuned on spatial tasks for spatial omics data to decode spatially resolved cellular information. We demonstrate the usefulness of Nicheformer in both zero-shot-like as well as fine-tuning scenarios on a novel set of spatially-relevant downstream tasks such as spatial density prediction or niche and region label prediction. In particular, we show that Nicheformer enables the prediction of the spatial context of dissociated cells, allowing the transfer of rich spatial information to scRNA-seq datasets. We define a series of novel spatial prediction problems and observe consistent top performance of Nicheformer, demonstrating the advantage of the improved model capacity of the underlying transformer. Altogether, our large-scale resource of more than 110 million cells in a partial spatial context, together with the set of novel spatial learning tasks and the Nicheformer model itself, will pave the way for the next generation of machine-learning models for spatial single-cell analysis.

show abstract