Abstract:Background: Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the… Show more
“…At a detection rate threshold of 20% (commonly applied to single-cell datasets 11 , 25 ), most cell types in the Tabula Muris dataset expressed over a hundred ligands and receptors, with hematopoietic cell types expressing fewer ligands/receptors than other lineages (Supplementary Fig. 4a ).…”
Development of high throughput single-cell sequencing technologies has made it cost-effective to profile thousands of cells from diverse samples containing multiple cell types. To study how these different cell types work together, here we develop NATMI (Network Analysis Toolkit for Multicellular Interactions). NATMI uses connectomeDB2020 (a database of 2293 manually curated ligand-receptor pairs with literature support) to predict and visualise cell-to-cell communication networks from single-cell (or bulk) expression data. Using multiple published single-cell datasets we demonstrate how NATMI can be used to identify (i) the cell-type pairs that are communicating the most (or most specifically) within a network, (ii) the most active (or specific) ligand-receptor pairs active within a network, (iii) putative highly-communicating cellular communities and (iv) differences in intercellular communication when profiling given cell types under different conditions. Furthermore, analysis of the Tabula Muris (organism-wide) atlas confirms our previous prediction that autocrine signalling is a major feature of cell-to-cell communication networks, while also revealing that hundreds of ligands and their cognate receptors are co-expressed in individual cells suggesting a substantial potential for self-signalling.
“…At a detection rate threshold of 20% (commonly applied to single-cell datasets 11 , 25 ), most cell types in the Tabula Muris dataset expressed over a hundred ligands and receptors, with hematopoietic cell types expressing fewer ligands/receptors than other lineages (Supplementary Fig. 4a ).…”
Development of high throughput single-cell sequencing technologies has made it cost-effective to profile thousands of cells from diverse samples containing multiple cell types. To study how these different cell types work together, here we develop NATMI (Network Analysis Toolkit for Multicellular Interactions). NATMI uses connectomeDB2020 (a database of 2293 manually curated ligand-receptor pairs with literature support) to predict and visualise cell-to-cell communication networks from single-cell (or bulk) expression data. Using multiple published single-cell datasets we demonstrate how NATMI can be used to identify (i) the cell-type pairs that are communicating the most (or most specifically) within a network, (ii) the most active (or specific) ligand-receptor pairs active within a network, (iii) putative highly-communicating cellular communities and (iv) differences in intercellular communication when profiling given cell types under different conditions. Furthermore, analysis of the Tabula Muris (organism-wide) atlas confirms our previous prediction that autocrine signalling is a major feature of cell-to-cell communication networks, while also revealing that hundreds of ligands and their cognate receptors are co-expressed in individual cells suggesting a substantial potential for self-signalling.
“…An example in this fourth category is densi-tyCut [32], which estimates the number of cell types from a given dataset by modelling the density of cell distributions for generating a hierarchical cluster tree and subsequently selecting clusters that are most stable in the hierarchical cluster tree. In this study, we propose an alternative stability-based approach by taking advantage of scCCESS, a random sampling-based ensemble deep clustering model, previously proposed for scRNA-seq data clustering [33] for estimating the number of cell types. Our key assumption is that clustering from using the optimal number of clusters would be the most robust to small perturbations in the data, such as those introduced by random resampling, compared to those generated under the suboptimal number of clusters.…”
Background
A key task in single-cell RNA-seq (scRNA-seq) data analysis is to accurately detect the number of cell types in the sample, which can be critical for downstream analyses such as cell type identification. Various scRNA-seq data clustering algorithms have been specifically designed to automatically estimate the number of cell types through optimising the number of clusters in a dataset. The lack of benchmark studies, however, complicates the choice of the methods.
Results
We systematically benchmark a range of popular clustering algorithms on estimating the number of cell types in a variety of settings by sampling from the Tabula Muris data to create scRNA-seq datasets with a varying number of cell types, varying number of cells in each cell type, and different cell type proportions. The large number of datasets enables us to assess the performance of the algorithms, covering four broad categories of approaches, from various aspects using a panel of criteria. We further cross-compared the performance on datasets with high cell numbers using Tabula Muris and Tabula Sapiens data.
Conclusions
We identify the strengths and weaknesses of each method on multiple criteria including the deviation of estimation from the true number of cell types, variability of estimation, clustering concordance of cells to their predefined cell types, and running time and peak memory usage. We then summarise these results into a multi-aspect recommendation to the users. The proposed stability-based approach for estimating the number of cell types is implemented in an R package and is freely available from (https://github.com/PYangLab/scCCESS).
An important research effort has been recently dedicated to understand the decision mechanism of deep neural networks. Among them, Class Activation Mapping (CAM) and its variations have proved their capacity to obtain useful insights about Convolutional Neural Network (CNN) models' decisions. However, these methods remain limited to the supervised case regardless of CNN-based advances in unsupervised tasks such as clustering. To fill this gap, we propose a new method called Grad-CeAM for centroid-based clustering methods used on CNN representation. Through an experimental study, we show that our method has the capacity to localize discriminating features used by a CNN model to create its representation and that it can be used to explain the clusters assignment. We also show that this method can be used in different application domains by providing uses cases on time series and images clustering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.