Power weighted shortest paths for clustering Euclidean data

McKenzie, Daniel; Damelin, Steven B.

doi:10.3934/fods.2019014

Cited by 12 publications

(17 citation statements)

References 13 publications

(28 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Compared to the 1 geodesic, which is density-agnostic, the UPD prefers paths that avoid low-density regions. Like classical shortest paths, it may be computed efficiently using a Dijkstra-type algorithm [35]. A comparison of Euclidean distances and UPD is in Figure 1.…”

Section: Background On Ultrametric Path Distancesmentioning

confidence: 99%

Hyperspectral Image Clustering with Spatially-Regularized Ultrametrics

Zhang

Murphy

2021

Remote Sensing

View full text Add to dashboard Cite

We propose a method for the unsupervised clustering of hyperspectral images based on spatially regularized spectral clustering with ultrametric path distances. The proposed method efficiently combines data density and spectral-spatial geometry to distinguish between material classes in the data, without the need for training labels. The proposed method is efficient, with quasilinear scaling in the number of data points, and enjoys robust theoretical performance guarantees. Extensive experiments on synthetic and real HSI data demonstrate its strong performance compared to benchmark and state-of-the-art methods. Indeed, the proposed method not only achieves excellent labeling accuracy, but also efficiently estimates the number of clusters. Thus, unlike almost all existing hyperspectral clustering methods, the proposed algorithm is essentially parameter-free.

show abstract

Section: Background On Ultrametric Path Distancesmentioning

confidence: 99%

Hyperspectral Image Clustering with Spatially-Regularized Ultrametrics

Zhang

Murphy

2021

Remote Sensing

View full text Add to dashboard Cite

show abstract

“…Early uses of density-based distances for interpolation [45] led to the formulation of PWSPD in the context of unsupervised and semisupervised learning and applications [24,50,13,44,14,11,39,38,34,53,12]. It will occasionally be useful to think of p p (•, •) as the path distance in the complete graph on X with edge weights x i − x j p , which we shall denote G p X .…”

Section: ∼ F (X)mentioning

confidence: 99%

“…, where C d is a constant that depends exponentially on the intrinsic dimensionality of the data [38]. When p is large, the PWSPDs derived from a complete graph are known in some cases to be the same as the PWSPDs derived from a kNN graph under particular scalings of k n. This provides a significant computational advantage, since kNN graphs are much sparser, and reduces the complexity of computing all-pairs PWSPD to O(kn 2 ) [33].…”

Section: ∼ F (X)mentioning

confidence: 99%

Balancing Geometry and Density: Path Distances on High-Dimensional Data

Little¹,

McKenzie²,

Murphy³

2020

Preprint

Self Cite

View full text Add to dashboard Cite

New geometric and computational analyses of power-weighted shortest-path distances (PWSPDs) are presented. By illuminating the way these metrics balance density and geometry in the underlying data, we clarify their key parameters and discuss how they may be chosen in practice.Comparisons are made with related data-driven metrics, which illustrate the broader role of density in kernel-based unsupervised and semi-supervised machine learning. Computationally, we relate PWSPDs on complete weighted graphs to their analogues on weighted nearest neighbor graphs, providing high probability guarantees on their equivalence that are near-optimal. Connections with percolation theory are developed to establish estimates on the bias and variance of PWSPDs in the finite sample setting. The theoretical results are bolstered by illustrative experiments, demonstrating the versatility of PWSPDs for a wide range of data settings. Throughout the paper, our results require only that the underlying data is sampled from a low-dimensional manifold, and depend crucially on the intrinsic dimension of this manifold, rather than its ambient dimension.

show abstract

“…To achieve both of these goals, we propose an embedding method based on the power weighted path metric. These metrics balance density and geometry considerations in the data, making them useful for many machine learning tasks such as clustering and semi-supervised learning (18)(19)(20)(21)(22)(23)(24)(25)(26). They have performed well in numerous applications especially imaging (24,25,27,28), but their usefulness for the analysis of single cell RNA sequence data remains unexplored.…”

Section: Introductionmentioning

confidence: 99%

Clustering and visualization of single-cell RNA-seq data using path metrics

Manousidaki

Little

Xie

2021

Preprint

View full text Add to dashboard Cite

Recent advances in single-cell technologies have enabled high-resolution characterization of tissue and cancer compositions. Although numerous tools for dimension reduction and clustering are available for single-cell data analyses, these methods often fail to simultaneously preserve local cluster structure and global data geometry. This article explores the application of power-weighted path metrics for the analysis of single cell RNA data.Extensive experiments on single cell RNA sequencing data sets confirm the usefulness of path metrics for dimension reduction and clustering. Distances between cells are measured in a data-driven way which is both density sensitive (decreasing distances across high density regions) and respects the underlying data geometry. By combining path metrics with multidimensional scaling, a low dimensional embedding of the data is obtained which respects both the global geometry of the data and preserves cluster structure. We evaluate the method both for clustering quality and geometric fidelity, and it outperforms other algorithms on a wide range of bench marking data sets.

show abstract

Power weighted shortest paths for clustering Euclidean data

Cited by 12 publications

References 13 publications

Hyperspectral Image Clustering with Spatially-Regularized Ultrametrics

Hyperspectral Image Clustering with Spatially-Regularized Ultrametrics

Balancing Geometry and Density: Path Distances on High-Dimensional Data

Clustering and visualization of single-cell RNA-seq data using path metrics

Contact Info

Product

Resources

About