2016
DOI: 10.1186/s13059-016-0970-8
|View full text |Cite
|
Sign up to set email alerts
|

Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts

Abstract: Current approaches to single-cell transcriptomic analysis are computationally intensive and require assay-specific modeling, which limits their scope and generality. We propose a novel method that compares and clusters cells based on their transcript-compatibility read counts rather than on the transcript or gene quantifications used in standard analysis pipelines. In the reanalysis of two landmark yet disparate single-cell RNA-seq datasets, we show that our method is up to two orders of magnitude faster than … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
93
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 109 publications
(95 citation statements)
references
References 54 publications
0
93
0
Order By: Relevance
“…Pseudoalignment divides (under reasonable assumptions 121 ) reads into equivalence classes, each consisting of the set of transcripts the read could have originated from. By counting the number of reads a cell has in each of the classes (transcript-compatibility counts, TCC), one obtains a high-dimensional representation of the cells with features other than explicitly quantified gene expression 122 . While this feature space is not as biologically interpretable as gene expression space, it can produce 122 a similar cell clustering while sidestepping the time-consuming quantification task and avoiding the need to define a statistical model for read generation.…”
Section: Revealing the Vectors Of Cellular Identitymentioning
confidence: 99%
See 1 more Smart Citation
“…Pseudoalignment divides (under reasonable assumptions 121 ) reads into equivalence classes, each consisting of the set of transcripts the read could have originated from. By counting the number of reads a cell has in each of the classes (transcript-compatibility counts, TCC), one obtains a high-dimensional representation of the cells with features other than explicitly quantified gene expression 122 . While this feature space is not as biologically interpretable as gene expression space, it can produce 122 a similar cell clustering while sidestepping the time-consuming quantification task and avoiding the need to define a statistical model for read generation.…”
Section: Revealing the Vectors Of Cellular Identitymentioning
confidence: 99%
“…By counting the number of reads a cell has in each of the classes (transcript-compatibility counts, TCC), one obtains a high-dimensional representation of the cells with features other than explicitly quantified gene expression 122 . While this feature space is not as biologically interpretable as gene expression space, it can produce 122 a similar cell clustering while sidestepping the time-consuming quantification task and avoiding the need to define a statistical model for read generation. Therefore, this approach reverses the order of quantification and clustering: one first clusters cells in the TCC space, and then quantifies gene expression only from representative cells in each cluster, or pooled data from the entire cluster, to assign a biological interpretation to the clusters.…”
Section: Revealing the Vectors Of Cellular Identitymentioning
confidence: 99%
“…For gene expression quantification some examples include; STAR, RSEM, the Tuxedo Suite, and Kallisto [2529]. These approaches map sequencing reads to a reference genome, a transcriptome index or perform de novo assembly without a reference genome to allow for expression quantification.…”
Section: Data Processingmentioning
confidence: 99%
“…We illustrate a simple example using Kallisto, which quantifies expression quickly due to its use of transcriptome psuedoalignment [25,29]. The output from Kallisto will provide gene expression in transcripts per million (TPM) which is normalized for library depth.…”
Section: Figurementioning
confidence: 99%
“…DTU testing on equivalence classes is fast and alleviates shortcomings in directly estimating transcript abundances before statistical testing. Indeed, performing analysis directly on equivalent classes has been proposed previously in the context of fast clustering single-cell RNA-seq data 7 .…”
Section: Introductionmentioning
confidence: 99%