2022
DOI: 10.1186/s13059-021-02590-x
|View full text |Cite
|
Sign up to set email alerts
|

Phiclust: a clusterability measure for single-cell transcriptomics reveals phenotypic subpopulations

Abstract: The ability to discover new cell phenotypes by unsupervised clustering of single-cell transcriptomes has revolutionized biology. Currently, there is no principled way to decide whether a cluster of cells contains meaningful subpopulations that should be further resolved. Here, we present phiclust (ϕclust), a clusterability measure derived from random matrix theory that can be used to identify cell clusters with non-random substructure, testably leading to the discovery of previously overlooked phenotypes.

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 65 publications
0
6
0
Order By: Relevance
“…Input the data subset as the "target data" into ClusterDE. It is worth noting that ClusterDE does not provide an automatic decision about whether two clusters should be merged, unlike the methods that directly assess the quality of clusters [13][14][15][16][17]. Instead, ClusterDE focuses on identifying trustworthy post-clustering DE genes as potential cell-type marker genes, enabling researchers to gain biological insights into clusters by investigating the specific genes that distinguish the clusters.…”
Section: Practical Guidelines For Clusterde Usagementioning
confidence: 99%
See 1 more Smart Citation
“…Input the data subset as the "target data" into ClusterDE. It is worth noting that ClusterDE does not provide an automatic decision about whether two clusters should be merged, unlike the methods that directly assess the quality of clusters [13][14][15][16][17]. Instead, ClusterDE focuses on identifying trustworthy post-clustering DE genes as potential cell-type marker genes, enabling researchers to gain biological insights into clusters by investigating the specific genes that distinguish the clusters.…”
Section: Practical Guidelines For Clusterde Usagementioning
confidence: 99%
“…In other words, the clusterfree DE genes and the post-clustering DE genes serve different purposes and are not conceptually comparable. Another stream of methods has been developed to assess the quality of clustering results, e.g., the "purity" of a cluster or if two clusters should be merged [13][14][15][16][17]. However, these methods do not provide a direct statistical test for identifying DE genes, and it remains difficult to determine the threshold for clustering quality above which double dipping is not a concern.…”
Section: Introductionmentioning
confidence: 99%
“…Deep learning methods are frequently used however often struggle to cluster the high dimensional and increasing size of modern scRNA-seq datasets. [37][38][39] The most widely used linear dimensionality reduction algorithm is the PCA (Principal Component Analysis). Significant principal components can be used for nonlinear dimensionality reduction and to dependably indicate sources of heterogeneity in a dataset.…”
Section: Dimensionality Reductionmentioning
confidence: 99%
“…To emphasize the general and important consequences of this form of EN for a diverse range of practical applications, we consider generic ensembles of random matrices with fixed margins. These ensembles, which include matrices with 0/1 (or equivalently ±1) and non-negative integer entries subject to global or local constraints, arise for instance in studies of multi-cell gene expression profiles [17], multiplex (online) social activity [18], multi-channel communication systems [19], complex networks [20], and multivariate time series in finance [16], neuroscience [21] or other disciplines. Our results imply that, in many practical situations, the assumption of EE is incorrect and leads to mathematically wrong conclusions.…”
Section: Introductionmentioning
confidence: 99%