A Statistical Performance Analysis of Graph Clustering Algorithms

Miasnikof, Pierre; Shestopaloff, Alexander Y.; Bonner, Anthony J.; Lawryshyn, Yuri

doi:10.1007/978-3-319-92871-5_11

Cited by 12 publications

(17 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Here, we slightly modify the procedure to generate inter-cluster edges. In our previous article [35], we varied the proportion of vertices inside and outside each cluster that shared an edge. Here, we vary edge probabilities.…”

Section: Experimental Set-up and Resultsmentioning

confidence: 99%

See 1 more Smart Citation

A density-based statistical analysis of graph clustering algorithm performance

Miasnikof

Shestopaloff

Bonner

et al. 2020

Journal of Complex Networks

Self Cite

View full text Add to dashboard Cite

We introduce graph clustering quality measures based on comparisons of global, intra- and inter-cluster densities, an accompanying statistical significance test and a step-by-step routine for clustering quality assessment. Our work is centred on the idea that well-clustered graphs will display a mean intra-cluster density that is higher than global density and mean inter-cluster density. We do not rely on any generative model for the null model graph. Our measures are shown to meet the axioms of a good clustering quality function. They have an intuitive graph-theoretic interpretation, a formal statistical interpretation and can be tested for significance. Empirical tests also show they are more responsive to graph structure, less likely to breakdown during numerical implementation and less sensitive to uncertainty in connectivity than the commonly used measures.

show abstract

Section: Experimental Set-up and Resultsmentioning

confidence: 99%

“…Here, we streamline our statistical test. In our previous article [35], we conducted two separate tests. We formulated two null hypotheses,K intra = K andK inter = K, to avoid the effects of a possible correlation between K intra andK inter (K is a graph constant, not the result of a clustering).…”

Section: Hypothesis Testingmentioning

confidence: 99%

A density-based statistical analysis of graph clustering algorithm performance

Miasnikof

Shestopaloff

Bonner

et al. 2020

Journal of Complex Networks

Self Cite

View full text Add to dashboard Cite

show abstract

“…The links between density and clustered patterns of connectivity were shown in Miasnikof et al [26]. Under such a pattern of connectivity, it is expected that the densities of induced subgraphs obtained by sampling vertices within a neighborhood will exhibit, on average, higher densities than the graph's global density.…”

Section: Underlying Assumptions and Densitiesmentioning

confidence: 93%

A Statistical Test of Heterogeneous Subgraph Densities to Assess Clusterability

Miasnikof

Prokhorenkova

Shestopaloff

et al. 2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Determining if a graph displays a clustered structure prior to subjecting it to any cluster detection technique has recently gained attention in the literature. Attempts to group graph vertices into clusters when a graph does not have a clustered structure is not only a waste of time but will also lead to misleading conclusions. To address this problem, we introduce a novel statistical test, the δ-test, which is based on comparisons of local and global densities. Our goal is to assess whether a given graph meets the necessary conditions to be meaningfully summarized by clusters of vertices. We empirically explore our test's behavior under a number of graph structures. We also compare it to other recently published tests. From a theoretical standpoint, our test is more general, versatile and transparent than recently published competing techniques. It is based on the examination of intuitive quantities, applies equally to weighted and unweighted graphs and allows comparisons across graphs. More importantly, it does not rely on any distributional assumptions, other than the universally accepted definition of a clustered graph. Empirically, our test is shown to be more responsive to graph structure than other competing tests.

show abstract

“…We then examine the relationship between mean Jaccard [10], Otsuka-Ochiai [17] and Burt's distances [2], on one hand, and intra-cluster density [14,13,15,16] within each cluster, on the other. Because these distances are pairwise measures, we compare their mean value for a given cluster to the cluster's internal density.…”

Section: Distance Measurements Under Studymentioning

confidence: 99%

“…At the cluster level, this distance takes the form of subsets of densely connected vertices. The link between clustering and density has been discussed in depth, recently [14,15,13,16]. In this article, our ultimate goal is to transform a graph's adjacency matrix into a |V | × |V | similarity or distance matrix D = [d ij ], where the distance between each pair of vertices is given by the element d ij .…”

Section: Introductionmentioning

confidence: 99%

Graph Distances and Clustering

Miasnikof,

Shestopaloff,

Pitsoulis

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

With a view on graph clustering, we present a definition of vertexto-vertex distance which is based on shared connectivity. We argue that vertices sharing more connections are closer to each other than vertices sharing fewer connections. Our thesis is centered on the widely accepted notion that strong clusters are formed by high levels of induced subgraph density, where subgraphs represent clusters. We argue these clusters are formed by grouping vertices deemed to be similar in their connectivity. At the cluster level (induced subgraph level), our thesis translates into low mean intra-cluster distances. Our definition differs from the usual shortest-path geodesic distance. In this article, we compare three distance measures from the literature. Our benchmark is the accuracy of each measure's reflection of intra-cluster density, when aggregated (averaged) at the cluster level. We conduct our tests on synthetic graphs generated using the planted partition model, where clusters and intra-cluster density are known in advance. We examine correlations between mean intra-cluster distances and intracluster densities. Our numerical experiments show that Jaccard and Otsuka-Ochiai offer very accurate measures of density, when averaged over vertex pairs within clusters.

show abstract

A Statistical Performance Analysis of Graph Clustering Algorithms

Cited by 12 publications

References 39 publications

A density-based statistical analysis of graph clustering algorithm performance

A density-based statistical analysis of graph clustering algorithm performance

A Statistical Test of Heterogeneous Subgraph Densities to Assess Clusterability

Graph Distances and Clustering

Contact Info

Product

Resources

About