A density-based statistical analysis of graph clustering algorithm performance

Miasnikof, Pierre; Shestopaloff, Alexander Y.; Bonner, Anthony J.; Lawryshyn, Yuri; Pardalos, Pãnos M.

doi:10.1093/comnet/cnaa012

Cited by 13 publications

(22 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In a second step, we determined the relatedness of Level 1 responses by calculating a weighted Jaccard similarity between them (Ioffe, 2010), which has been found to perform well relative to other similarity measures in clustering approaches (e.g., Huang et al, 2008;Saad & Kamarudin, 2013;Strehl et al, 2000). In a third step, we represented the similarity matrix of Level 1 responses as a weighted network and extracted the components of risk using the Louvain modularity algorithm (Blondel et al, 2008), which has been found to compare favorably to other modularity and clustering algorithms (e.g., Emmons et al, 2016;Miasnikof et al, 2020;Pradana et al, 2020;Williams et al, 2019). One attractive feature of modularity detection algorithms, such as the Louvain algorithm, is that they also identify an optimal number of clusters.…”

Section: The Semantic Network Of Riskmentioning

confidence: 99%

On the semantic representation of risk

Wulff¹,

Mata²

2021

Preprint

View full text Add to dashboard Cite

There is great theoretical and applied interest in understanding the psychology of risk - but what are defining features of lay people's semantic representation of this concept? We contribute a new approach to mapping the semantics of risk based on word associations that promises to provide insight into individual and group differences. Specifically, we introduce a novel mini-snowball word-association paradigm and use the tools of network and sentiment analysis to characterize the semantics of "risk" from 1,205 respondents (age range = 18-86; 50\% female). We find that association-based representations extend those extracted from past survey- and text-based approaches to the semantics of risk. Crucially, we show that the semantics of risk vary systematically across demographic groups, with older and female respondents showing more negative connotations and mentioning more often certain types of activities (e.g., recreational activities) relative to younger adults and males, respectively. Our work has implications for the measurement of risk-related constructs by suggesting that "risk" means different things to different individuals.

show abstract

Section: The Semantic Network Of Riskmentioning

confidence: 99%

On the semantic representation of risk

Wulff¹,

Mata²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We then examine the relationship between mean Jaccard [10], Otsuka-Ochiai [17] and Burt's distances [2], on one hand, and intra-cluster density [14,13,15,16] within each cluster, on the other. Because these distances are pairwise measures, we compare their mean value for a given cluster to the cluster's internal density.…”

Section: Distance Measurements Under Studymentioning

confidence: 99%

“…At the cluster level, this distance takes the form of subsets of densely connected vertices. The link between clustering and density has been discussed in depth, recently [14,15,13,16]. In this article, our ultimate goal is to transform a graph's adjacency matrix into a |V | × |V | similarity or distance matrix D = [d ij ], where the distance between each pair of vertices is given by the element d ij .…”

Section: Introductionmentioning

confidence: 99%

Graph Distances and Clustering

Miasnikof,

Shestopaloff,

Pitsoulis

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

With a view on graph clustering, we present a definition of vertexto-vertex distance which is based on shared connectivity. We argue that vertices sharing more connections are closer to each other than vertices sharing fewer connections. Our thesis is centered on the widely accepted notion that strong clusters are formed by high levels of induced subgraph density, where subgraphs represent clusters. We argue these clusters are formed by grouping vertices deemed to be similar in their connectivity. At the cluster level (induced subgraph level), our thesis translates into low mean intra-cluster distances. Our definition differs from the usual shortest-path geodesic distance. In this article, we compare three distance measures from the literature. Our benchmark is the accuracy of each measure's reflection of intra-cluster density, when aggregated (averaged) at the cluster level. We conduct our tests on synthetic graphs generated using the planted partition model, where clusters and intra-cluster density are known in advance. We examine correlations between mean intra-cluster distances and intracluster densities. Our numerical experiments show that Jaccard and Otsuka-Ochiai offer very accurate measures of density, when averaged over vertex pairs within clusters.

show abstract

“…Our approach is more flexible and allows us to circumvent modularity's many shortcomings. These shortcomings have been well-documented in the literature (e.g., [11,1,29,30,31]). Furthermore, we choose to formulate our problem as a QUBO problem, in order to overcome computational intractability and benefit from new hardware developments.…”

Section: Introductionmentioning

confidence: 97%

“…Vertices that share more connections are defined as closer, more similar, to each other than to the ones with which they share fewer connections. Successful clustering results in vertices grouped into densely connected induced subgraphs (e.g., [29,30,31]). Figure 1 shows an example of a successful and an unsuccessful clustering.…”

Section: Introductionmentioning

confidence: 99%

Graph Clustering Via QUBO and Digital Annealing

Miasnikof,

Hong,

Lawryshyn

2020

Preprint

Self Cite

View full text Add to dashboard Cite

This article empirically examines the computational cost of solving a known hard problem, graph clustering, using novel purpose-built computer hardware. We express the graph clustering problem as an intra-cluster distance or dissimilarity minimization problem. We formulate our poblem as a quadratic unconstrained binary optimization problem and employ a novel computer architecture to obtain a numerical solution. Our starting point is a clustering formulation from the literature. This formulation is then converted to a quadratic unconstrained binary optimization formulation. Finally, we use a novel purpose-built computer architecture to obtain numerical solutions. For benchmarking purposes, we also compare computational performances to those obtained using a commercial solver, Gurobi, running on conventional hardware. Our initial results indicate the purpose-built hardware provides equivalent solutions to the commercial solver, but in a very small fraction of the time required.

show abstract

A density-based statistical analysis of graph clustering algorithm performance

Cited by 13 publications

References 47 publications

On the semantic representation of risk

On the semantic representation of risk

Graph Distances and Clustering

Graph Clustering Via QUBO and Digital Annealing

Contact Info

Product

Resources

About