2022
DOI: 10.1093/bib/bbac387
|View full text |Cite
|
Sign up to set email alerts
|

How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data

Abstract: Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. Accordingly, proximity metrics have been benchmarked for scRNA-seq clustering, typically with results averaged across datasets to identify a highest performing metric. However, the ‘best-performing’ metric varies bet… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(12 citation statements)
references
References 71 publications
0
11
0
Order By: Relevance
“…To assess heterogeneity within clusters or relationships between clusters, similarity metrics or distances can be calculated between the cells [33] and displayed with qualitative or quantitative visuals which preserve these metrics, including hierarchical relationship diagrams such as dendrograms and trees [78, 79], or graph-based network diagrams [80, 81]. Higher-level diagrams that do not seek to display all point-wise information can also be used to represent the results of other inter-cluster analyses [68, 82].…”
Section: Final Thoughts and Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…To assess heterogeneity within clusters or relationships between clusters, similarity metrics or distances can be calculated between the cells [33] and displayed with qualitative or quantitative visuals which preserve these metrics, including hierarchical relationship diagrams such as dendrograms and trees [78, 79], or graph-based network diagrams [80, 81]. Higher-level diagrams that do not seek to display all point-wise information can also be used to represent the results of other inter-cluster analyses [68, 82].…”
Section: Final Thoughts and Discussionmentioning
confidence: 99%
“…However, this procedure is taken as a baseline [6,28], and there is little discussion of the logic behind this coupling. We do present analyses alternatively measuring distortions with the L 1 metric, given its more desirable properties in higher dimensions than Euclidean (L 2 ) distance (see above), but other choices of distance metrics are possible and, whether in ambient or reduced space, can provide different implications and interpretations of the dataset's properties [33]. In light of this, one might surmise that the non-linear methods instead learn other manifold-specific 'metrics' from cell neighborhoods by identifying 'biological geometries' (though this is not justified by the original authors [4,5]).…”
Section: Incoherences In the Dimensionality Reduction Processmentioning
confidence: 99%
See 1 more Smart Citation
“…A recent study [18] demonstrated that for single-cell RNA-seq datasets, correlation-based metrics outperformed distance-based metrics. This point was further illustrated by [57], where many distance metrics were benchmarked for clustering scRNA-seq data. The authors showed the significance of the impact of distance metrics on clustering and that correlation-based distances tend to perform better on average.…”
Section: Discussionmentioning
confidence: 99%
“…Since the calculation of distance measures constituted the most computationally expensive part of the algorithm, we aimed to limit the number of used distance measures. Many studies have demonstrated the functionality of Cosine similarity and Kullback-Leibler divergence in clustering, as they can effectively measure the (dis)similarity of clusters, especially when dealing with high-dimensional data, such as natural language processing applications and sc-RNA seq analysis 21 – 23 . Therefore, despite the availability of other distance measures that demonstrate comparable performance, such as Jaccard and Motyka, these two measures have been specifically included in the study.…”
Section: Methodsmentioning
confidence: 99%