2020
DOI: 10.2139/ssrn.3646565
|View full text |Cite
|
Sign up to set email alerts
|

Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data

Abstract: In single-cell RNA sequencing (scRNA-seq), doublets form when two cells are encapsulated into one reaction volume by chance. The existence of doublets, which appear to be-but are not-real cells, is a key confounder in scRNA-seq data analysis. Computational methods have been developed to detect doublets in scRNA-seq data; however, the scRNA-seq field lacks a comprehensive benchmarking of these methods, making it difficult for researchers to choose an appropriate method for their specific analysis needs. Here, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
44
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 14 publications
(44 citation statements)
references
References 15 publications
(16 reference statements)
0
44
0
Order By: Relevance
“…In fact, all model-based simulators that learn a generative model from real data must ignore certain outlier cells that do not fit well to their model. Some outlier cells could either represent an extremely rare cell type or are just “doublets” [9396], artifacts resulted from single-cell sequencing experiments. Hence, our stance is that ignorance of outlier cells is a sacrifice that every simulator has to make; the open question is the degree to which outlier cells should be ignored, and proper answers to this question must resort to statistical model selection principles.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In fact, all model-based simulators that learn a generative model from real data must ignore certain outlier cells that do not fit well to their model. Some outlier cells could either represent an extremely rare cell type or are just “doublets” [9396], artifacts resulted from single-cell sequencing experiments. Hence, our stance is that ignorance of outlier cells is a sacrifice that every simulator has to make; the open question is the degree to which outlier cells should be ignored, and proper answers to this question must resort to statistical model selection principles.…”
Section: Discussionmentioning
confidence: 99%
“…In fact, all model-based simulators that learn a generative model from real data must ignore certain outlier cells that do not fit well to their model. Some outlier cells could either represent an extremely rare cell type or are just "doublets" [96][97][98][99], artifacts resulted from single-cell sequencing experiments.…”
Section: Discussionmentioning
confidence: 99%
“…The scran package findMarkers() function was used to identify marker genes up-regulated in each cluster or annotated cell type. The potential presence of doublet cells was investigated with the scran package doubletCluster() function and with the scDblFinder package (version 1.2.0) (Xi and Li, 2020). Consistent with the moderate loading of 10X wells, a low number of potential doublets was detected, and no further filtering was performed.…”
Section: Methodsmentioning
confidence: 99%
“…According to the composition of doublets, doublets can be divided into two major classes: homotypic doublets, which originate from the same cell type, and heterotypic doublets which arise from distinct transcriptional cells generating an artificial hybrid transcriptome(McGinnis, Murrow and Gartner, 2019; Wolock, Lopez and Klein, 2019). Compared to homotypic doublets, heterotypic doublets are considered to have more impact on downstream analyses including dimensionality reduction, cell clustering, differential expression and cell developmental trajectories(Bernstein, Fong, Lam, Roy, Hendrickson and Kelley, 2020; Xi and Li, 2020). To reduce the number of doublets in experiments, decreasing the concentration of loaded cells is an effective control measure to obtain a lower doublet rate, but this approach also reduces the number of captured cells and dramatically increases the cost per sample(Bernstein, Fong, Lam, Roy, Hendrickson and Kelley, 2020; Zheng, Terry, Belgrader, Ryvkin, Bent, Wilson, Ziraldo, Wheeler, McDermott, Zhu et al, 2017).…”
Section: Introductionmentioning
confidence: 99%
“…Second, these techniques are only experimentally label doublets from different samples but ignore the kind of doublet generated by cells from the same sample or individual. Therefore several computational approaches have been developed to detect doublets in common scRNA-seq data, including these already generated data(Xi and Li, 2020). However, their results vary greatly, and there are noticeable differences even in the top-performing methods which were demonstrated in a benchmarking study(Xi and Li, 2020), so there is still a larger challenge in terms of accuracy due to the low concordance between individual methods and suboptimal accuracy of each method.…”
Section: Introductionmentioning
confidence: 99%