2018
DOI: 10.1371/journal.pcbi.1006378
|View full text |Cite|
|
Sign up to set email alerts
|

clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets

Abstract: Clustering of genes and/or samples is a common task in gene expression analysis. The goals in clustering can vary, but an important scenario is that of finding biologically meaningful subtypes within the samples. This is an application that is particularly appropriate when there are large numbers of samples, as in many human disease studies. With the increasing popularity of single-cell transcriptome sequencing (RNA-Seq), many more controlled experiments on model organisms are similarly creating large gene exp… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
43
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 53 publications
(43 citation statements)
references
References 36 publications
0
43
0
Order By: Relevance
“…Note that contrasting the start and endpoints of a lineage is a special case of a more general capability of tradeSeq to compare the mean expression between any two regions of a given lineage. As such, this test can be considered a generalization of cluster-based discrete DE within a lineage (e.g., Risso et al 25 ).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Note that contrasting the start and endpoints of a lineage is a special case of a more general capability of tradeSeq to compare the mean expression between any two regions of a given lineage. As such, this test can be considered a generalization of cluster-based discrete DE within a lineage (e.g., Risso et al 25 ).…”
Section: Methodsmentioning
confidence: 99%
“…Specifically, for each gene, we extract a number of fitted values for each lineage (100 by default). We can then use resampling-based sequential ensemble clustering (RSEC), as implemented in clusterExperiment 25 , to perform the clustering based on (the top principal components of) the standardized fitted values matrix (i.e., the fitted values are standardized to have zero mean and unit variance across cells for each gene). Importantly, we allow for any clustering algorithm that is built-in into cluster-Experiment or chosen by the user to perform the clustering.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The algorithm partitions N cells into k clusters each represented by a centroid, or mean profile, for the cells in the k th cluster. This algorithm is commonly used not only on its own, but also as a component of ensemble clustering [9,10].…”
Section: Introductionmentioning
confidence: 99%
“…For large enough data, k -means can be slow or completely fail if a user lacks sufficient computational resources. Ensemble clustering approaches that depend on the use of k -means [9,10] run it multiple times (e.g., with different parameter values or on a different data subset) limiting the usability of these packages for large scRNA-seq datasets [11]. We note that our goal here is not to debate the relative merits of k -means as a clustering algorithm -k -means is a well-established method, which has been thoroughly investigated [12] -but to provide users with the ability to use the popular k -means algorithm on large single-cell datasets.…”
Section: Introductionmentioning
confidence: 99%