2023
DOI: 10.1101/2023.07.18.549537
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Scalable querying of human cell atlases via a foundational model reveals commonalities across fibrosis-associated macrophages

Abstract: Single-cell RNA-seq (scRNA-seq) studies have profiled over 100 million human cells across diseases, developmental stages, and perturbations to date. A singular view of this vast and growing expression landscape could help reveal novel associations between cell states and diseases, discover cell states in unexpected tissue contexts, and relatein vivocells toin vitromodels. However, these require a common, scalable representation of cell profiles from across the body, a general measure of their similarity, and a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
25
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(25 citation statements)
references
References 58 publications
(42 reference statements)
0
25
0
Order By: Relevance
“…We modified the original TabNet implementation in a few crucial ways: scTab’s input data assumption is adapted to the single-cell setting, in particular, the input gene expression is size factor normalized to 10,000 counts per cell and log1p transformed. This common normalization for scRNA-seq data 7,23 cannot be replicated by the simple batch normalization layer used in the original TabNet architecture. We additionally modified the original TabNet architecture to improve computational efficiency, namely by reducing the number of feature and attention blocks (which we found unnecessary after profiling), and training dynamics for faster convergence (Methods).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We modified the original TabNet implementation in a few crucial ways: scTab’s input data assumption is adapted to the single-cell setting, in particular, the input gene expression is size factor normalized to 10,000 counts per cell and log1p transformed. This common normalization for scRNA-seq data 7,23 cannot be replicated by the simple batch normalization layer used in the original TabNet architecture. We additionally modified the original TabNet architecture to improve computational efficiency, namely by reducing the number of feature and attention blocks (which we found unnecessary after profiling), and training dynamics for faster convergence (Methods).…”
Section: Resultsmentioning
confidence: 99%
“…Moreover, unlike in the original TabNet model, we normalized the input data before feeding it into the neural network. scRNA-seq data is often normalized to have 10,000 counts per cell and is then log1p transformed afterward 7,12,23 , we applied the same normalization for our scTab model on top of the simple batch normalization layer, which is used in the original TabNet model to normalize the input features, as such a non-linear normalization cannot be achieved by a simple batch normalization layer.…”
Section: Methodsmentioning
confidence: 99%
“…For the ovarian cancer dataset, scDECAF pipeline was run on discovery mode (sparse gene set selection) and gene sets were pruned down to 91 gene sets. To prune gene sets we used a lasso penalty λ = e -2 and UMAP as the input embedding.…”
Section: Methodsmentioning
confidence: 99%
“…The identification of cell types, states and gene programs allows researchers to decipher the mechanisms of disease, empowering the development of new drugs and therapies. Cell types and states can be identified using various computational methods 2,3,4,5 that transfer annotations from single-cell references 6,7,8 , by means of tissue-specific cell type markers 9,10 or classifiers 3,4 . For example, Seurat V4 6 and SingleR 7 leverage existing reference scRNAseq datasets for cell type annotation, whereas CellAssign 9 and Garnett 10 can deliver the same task with cell type markers.…”
Section: Introductionmentioning
confidence: 99%
“…In this study, we assessed the zero-shot performance of two proposed foundation models in single-cell biology: Geneformer [5] and scGPT [6]. We selected these models as representative examples in a rapidly evolving field that includes other approaches like scBERT [4], scFoundation [7], SCimilarity [8], and GeneCompass [9]. Our assessment covers a range of tasks, including the utility of embeddings for cell type clustering, batch effect correction, and the effectiveness of the models’ input reconstruction based on the pretraining objectives (Fig.…”
Section: Introductionmentioning
confidence: 99%