2019
DOI: 10.3934/fods.2019014
|View full text |Cite
|
Sign up to set email alerts
|

Power weighted shortest paths for clustering Euclidean data

Abstract: We study the use of power weighted shortest path metrics for clustering high dimensional Euclidean data, under the assumption that the data is drawn from a collection of disjoint low dimensional manifolds. We argue, theoretically and experimentally, that this leads to higher clustering accuracy. We also present a fast algorithm for computing these distances.1. We prove that p-wspm's behave as expected for data satisfying the manifold hypothesis.That is, we show that the maximum distance between points in the s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
17
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(17 citation statements)
references
References 13 publications
(28 reference statements)
0
17
0
Order By: Relevance
“…Compared to the 1 geodesic, which is density-agnostic, the UPD prefers paths that avoid low-density regions. Like classical shortest paths, it may be computed efficiently using a Dijkstra-type algorithm [35]. A comparison of Euclidean distances and UPD is in Figure 1.…”
Section: Background On Ultrametric Path Distancesmentioning
confidence: 99%
“…Compared to the 1 geodesic, which is density-agnostic, the UPD prefers paths that avoid low-density regions. Like classical shortest paths, it may be computed efficiently using a Dijkstra-type algorithm [35]. A comparison of Euclidean distances and UPD is in Figure 1.…”
Section: Background On Ultrametric Path Distancesmentioning
confidence: 99%
“…Early uses of density-based distances for interpolation [45] led to the formulation of PWSPD in the context of unsupervised and semisupervised learning and applications [24,50,13,44,14,11,39,38,34,53,12]. It will occasionally be useful to think of p p (•, •) as the path distance in the complete graph on X with edge weights x i − x j p , which we shall denote G p X .…”
Section: ∼ F (X)mentioning
confidence: 99%
“…, where C d is a constant that depends exponentially on the intrinsic dimensionality of the data [38]. When p is large, the PWSPDs derived from a complete graph are known in some cases to be the same as the PWSPDs derived from a kNN graph under particular scalings of k n. This provides a significant computational advantage, since kNN graphs are much sparser, and reduces the complexity of computing all-pairs PWSPD to O(kn 2 ) [33].…”
Section: ∼ F (X)mentioning
confidence: 99%
“…To achieve both of these goals, we propose an embedding method based on the power weighted path metric. These metrics balance density and geometry considerations in the data, making them useful for many machine learning tasks such as clustering and semi-supervised learning (18)(19)(20)(21)(22)(23)(24)(25)(26). They have performed well in numerous applications especially imaging (24,25,27,28), but their usefulness for the analysis of single cell RNA sequence data remains unexplored.…”
Section: Introductionmentioning
confidence: 99%