2020
DOI: 10.1101/2020.05.22.111161
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Benchmarking atlas-level data integration in single-cell genomics

Abstract: Cell atlases often include samples that span locations, labs, and conditions, leading to complex, nested batch effects in data. Thus, joint analysis of atlas datasets requires reliable data integration .Choosing a data integration method is a challenge due to the difficulty of defining integration success. Here, we benchmark 38 method and preprocessing combinations on 77 batches of gene expression, chromatin accessibility, and simulation data from 23 publications, altogether representing >1.2 million cells dis… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

5
141
0
2

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 111 publications
(148 citation statements)
references
References 68 publications
(79 reference statements)
5
141
0
2
Order By: Relevance
“…There is an evident batch effect between the samples, with the cells of each sample clustering together regardless of their type (Figure 1C and 1F). Consistently with a recent benchmark (7), dataset alignment using Seurat 3 appears to overcorrect these batch effects, overlaying samples with little in common such as CD4+ dLN and CD8+ TILs (Figure 1D and 1G). In contrast, STACAS only aligns cells with similar states across samples, limiting the superposition of CD4+ with CD8+ cells (Figure 1E).…”
Section: Resultssupporting
confidence: 82%
See 1 more Smart Citation
“…There is an evident batch effect between the samples, with the cells of each sample clustering together regardless of their type (Figure 1C and 1F). Consistently with a recent benchmark (7), dataset alignment using Seurat 3 appears to overcorrect these batch effects, overlaying samples with little in common such as CD4+ dLN and CD8+ TILs (Figure 1D and 1G). In contrast, STACAS only aligns cells with similar states across samples, limiting the superposition of CD4+ with CD8+ cells (Figure 1E).…”
Section: Resultssupporting
confidence: 82%
“…To handle more than two datasets, a guide tree based on pairwise batch similarities is used to dictate the batch integration order. While Seurat has proven very powerful for the removal of technical artifacts between replicated experiments or even different sequencing technologies (5), it tends to overcorrect batch effects and performs poorly when integrating heterogeneous datasets (7), where only a fraction of cell types are shared between individual samples. This is crucial for the creation of reference cell type-specific single-cell atlases where the datasets to integrate were obtained from different tissues or experimental conditions (e.g.…”
mentioning
confidence: 99%
“…Following the robust data integration performance of scArches, we investigated whether it can map queries to references across nominally stronger batch effects arising from tissues, and even species [7].…”
Section: Architectural Surgery Enables Integrating Cell Atlases Acrosmentioning
confidence: 99%
“…Yet, query datasets and reference atlases typically comprise data generated in different labs, with differing experimental protocols and thus contain batch effects. Data integration methods are typically used to overcome these batch effects in reference construction [7]. However, these approaches require access to all datasets used to generate the reference, which can be prohibitive especially for human data due to legal restrictions on data sharing.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation