24Annotating cell types is a critical step in single cell RNA-Seq (scRNA-Seq) data analysis. Some 25 deconvolution methods have recently emerged to enable automated cell type identification. 26 However, comprehensive evaluations of these methods are lacking to provide practical guidelines. 27 Moreover, it is not clear whether some deconvolution methods originally designed for analyzing 28 other omics data are adaptable to scRNA-Seq analysis. In this study, we evaluated ten cell-type 29 deconvolution methods publicly available as R packages. Eight of them are popular methods 30 developed specifically for single cell research (Seurat, scmap, SingleR, CHETAH, SingleCellNet, 31 scID, Garnett, SCINA). The other two methods are repurposed from deconvoluting DNA 32 methylation data: Linear Constrained Projection (CP) and Robust Partial Correlations (RPC). We 33 conducted systematic comparisons on a wide variety of public scRNA-seq datasets as well as 34 simulation data. We assessed the accuracy through intra-dataset and inter-dataset predictions, the 35 robustness over practical challenges such as gene filtering and high similarity among cell types, as 36 well as the capabilities on rare and unknown cell-type detection. Overall, methods such as Seurat,
37SingleR, CP, RPC and SingleCellNet performed well, with Seurat being the best at annotating 38 major cell types. Also, Seurat, SingleR and CP are more robust against down-sampling. However,
39Seurat does have a major drawback at predicting rare cell populations, and it is suboptimal at 40 differentiating cell types that are highly similar to each other, while SingleR and CP are much 41 better in these aspects. 42 43 45 46 47 48 49 50 51 52 Single cell RNA sequencing (scRNA-seq) has emerged as a powerful tool to enable the 53 characterization of cell types and states in complex tissues and organisms at the single-cell level 54 [1-5]. Annotating cell types amongst the cell clusters is a critical step before other downstream 55 analyses, such as differential gene expression and pseudo time analysis [6][7][8][9].56Conventionally, a set of priorly known cell-type specific markers are used to label the cell types 57 of the clusters manually. This process is laborious and often is a rate-limiting step for scRNA-seq 58 analysis. This approach is also prone to bias and errors. The marker may not be specific enough to 59 differentiate the cell subpopulations in the same dataset, or it may not be generic enough to be 60 applied from one study to another. Automating the cell type labeling is critical to enhance 61 reproducibility and consistency among single cell studies.
62Recently some deconvolution methods have emerged to systematically assign cell types in the 63 new scRNA-seq dataset, based on existing annotations from another dataset. Instead of using only 64 top differentiating markers, most methods project or correlate the new cells onto similar cells in 65 the well-annotated reference datasets, by leveraging the whole transcriptome profiles. These 66 decon...