Current approaches to single-cell transcriptomic analysis are computationally intensive and require assay-specific modeling, which limits their scope and generality. We propose a novel method that compares and clusters cells based on their transcript-compatibility read counts rather than on the transcript or gene quantifications used in standard analysis pipelines. In the reanalysis of two landmark yet disparate single-cell RNA-seq datasets, we show that our method is up to two orders of magnitude faster than previous approaches, provides accurate and in some cases improved results, and is directly applicable to data from a wide variety of assays.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-0970-8) contains supplementary material, which is available to authorized users.
Single-cell computational pipelines involve two critical steps: organizing cells (clustering) and identifying the markers driving this organization (differential expression analysis). State-of-the-art pipelines perform differential analysis after clustering on the same dataset. We observe that because clustering forces separation, reusing the same dataset generates artificially low p-values and hence false discoveries. We introduce a valid post-clustering differential analysis framework which corrects for this problem. We provide software at https://github.com/jessemzhang/tn_test.
BackgroundWith the recent proliferation of single-cell RNA-Seq experiments, several methods have been developed for unsupervised analysis of the resulting datasets. These methods often rely on unintuitive hyperparameters and do not explicitly address the subjectivity associated with clustering.ResultsIn this work, we present DendroSplit, an interpretable framework for analyzing single-cell RNA-Seq datasets that addresses both the clustering interpretability and clustering subjectivity issues. DendroSplit offers a novel perspective on the single-cell RNA-Seq clustering problem motivated by the definition of “cell type”, allowing us to cluster using feature selection to uncover multiple levels of biologically meaningful populations in the data. We analyze several landmark single-cell datasets, demonstrating both the method’s efficacy and computational efficiency.ConclusionDendroSplit offers a clustering framework that is comparable to existing methods in terms of accuracy and speed but is novel in its emphasis on interpretabilty. We provide the full DendroSplit software package at https://github.com/jessemzhang/dendrosplit.Electronic supplementary materialThe online version of this article (10.1186/s12859-018-2092-7) contains supplementary material, which is available to authorized users.
With the recent proliferation of single-cell RNA-Seq experiments, several methods have been developed for unsupervised analysis of the resulting datasets. These methods often rely on unintuitive hyperparameters and do not explicitly address the subjectivity associated with clustering. In this work, we present DendroSplit, an interpretable framework for analyzing single-cell RNA-Seq datasets that addresses both these issues. Under this framework, we cluster using feature selection to uncover multiple levels of biologically meaningful populations in the data. We analyze several landmark single-cell datasets, demonstrating both the method's efficacy and computational efficiency. We provide the full DendroSplit software package at https://github.com/jessemzhang/dendrosplit.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.