2021
DOI: 10.1186/s13059-021-02452-6
|View full text |Cite
|
Sign up to set email alerts
|

Sfaira accelerates data and model reuse in single cell genomics

Abstract: Single-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classif… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4

Relationship

4
5

Authors

Journals

citations
Cited by 21 publications
(21 citation statements)
references
References 71 publications
0
21
0
Order By: Relevance
“…The HLCA core is publicly available as a data portal and a model repository to explore, download, and use as a reference for new datasets. As the atlassing community has multiple outlets for newly generated data, we made the atlas available in Sfaira 70 , Zenodo 71 , Azimuth 14 , CellTypist 72 , and FASTGenomics 73 . Mapping new datasets to the HLCA core using scArches can be done via interactive portals, such as FASTGenomics and Azimuth, as well as locally using our Zenodo model 74 , which lends itself to integration into bioinformatics pipelines.…”
Section: Discussionmentioning
confidence: 99%
“…The HLCA core is publicly available as a data portal and a model repository to explore, download, and use as a reference for new datasets. As the atlassing community has multiple outlets for newly generated data, we made the atlas available in Sfaira 70 , Zenodo 71 , Azimuth 14 , CellTypist 72 , and FASTGenomics 73 . Mapping new datasets to the HLCA core using scArches can be done via interactive portals, such as FASTGenomics and Azimuth, as well as locally using our Zenodo model 74 , which lends itself to integration into bioinformatics pipelines.…”
Section: Discussionmentioning
confidence: 99%
“…To test this, we modified the scVI encoder to employ a non-amortized formulation (see Methods ), in which the parameters of the variational distribution are optimized for each cell individually. We trained expiMap, scVI, and non-amortized scVI to construct references with multiple atlases across five tissues obtained from Sfaira 53 . We observed that non-amortized versions of scVI consistently achieved superior or equal performance in data integration compared with expiMap, while remaining better than the default (amortized) scVI, which corroborated our hypothesis ( Fig.…”
Section: Resultsmentioning
confidence: 99%
“…We leverage datasets from five different tissues including PBMCs (n=161,764) [28], heart (n=18641) [30], lung (n=65,662) [31], colon (n=34,772) [32], and liver (n=113,063) [33]. All datasets, except heart, were obtained from the Sfaira database [34], which includes cell type labels. Heart was obtained from the scVI package.…”
Section: Expimap - Online Methodsmentioning
confidence: 99%
“…Advancement towards ultra-high throughput single-cell RNA sequencing (scRNA-seq) platforms and the concomitant development of computational algorithms required to analyze the data, permits the generation of organism-wide transcriptomic maps, resolved both in time and space [1,2]. If a reference atlas is available, new datasets can be annotated automatically thus introducing fast, data-driven and consistent labeling of the cells [3][4][5]. Unfortunately, the skeleton is minimally represented in most of these atlases, often with insufficiently detailed annotation of the skeletal lineage.…”
Section: Introductionmentioning
confidence: 99%