2019
DOI: 10.1145/3345951
|View full text |Cite
|
Sign up to set email alerts
|

Comparing Two Clusterings Using Matchings between Clusters of Clusters

Abstract: Clustering is a fundamental problem in data science, yet, the variety of clustering methods and their sensitivity to parameters make clustering hard. To analyze the stability of a given clustering algorithm while varying its parameters, and to compare clusters yielded by different algorithms, several comparison schemes based on matchings, information theory and various indices (Rand, Jaccard) have been developed. We go beyond these by providing a novel class of methods computing meta-clusters within each clust… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 45 publications
0
7
0
Order By: Relevance
“…Several research papers [ 10 , 11 ] have analyzed the stability of a given clustering algorithm while varying its parameters, and to compare clusters yielded by different algorithms, using comparison schemes based on matchings, information theory, and use of various indices (Rand, Jaccard). This was generalized to accommodate many-to-many matchings between clusters, via the D-family-matching on the intersection graph , with D as the upper bound on the diameter of the graph induced by the clusters of any meta-cluster by [ 12 ]. While this problem is NP-complete and hard to approximate, a polynomial time, spanning tree based heuristic was presented.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Several research papers [ 10 , 11 ] have analyzed the stability of a given clustering algorithm while varying its parameters, and to compare clusters yielded by different algorithms, using comparison schemes based on matchings, information theory, and use of various indices (Rand, Jaccard). This was generalized to accommodate many-to-many matchings between clusters, via the D-family-matching on the intersection graph , with D as the upper bound on the diameter of the graph induced by the clusters of any meta-cluster by [ 12 ]. While this problem is NP-complete and hard to approximate, a polynomial time, spanning tree based heuristic was presented.…”
Section: Related Workmentioning
confidence: 99%
“…In this paper, we are dealing with clusters obtained through two different datasets which are therefore, non-uniform in size and could represent different metrics. We present a novel way of using a specialized edge-weighing formulation in intersection graph and to find the correspondences through maximum-weighted bipartite matching of the graph in section [5.2, 5.3], figure [ 12 , 15 ]. It is noteworthy that our edge-weight metric can be generalized to compute similarity between two sets containing objects at different levels of hierarchies in a hierarchical dataset.…”
Section: Model and Techniquesmentioning
confidence: 99%
“…For this purpose, we applied a cluster matching framework, called D-family matching. 22 It first defines the "intersection graph" G of N i and N j as a bipartite graph where the vertices in the two partite sets correspond to the clusters of N i and N j . Each pair of clusters of N i and N j has an edge with the weight equal to the size of their intersection.…”
Section: Rq2: Inter-annotator Agreementmentioning
confidence: 99%
“…In addition, studies comparing algorithms or generated clusters have been performed [37]. Cazals et al proposed a framework to analyze the stability of clustering algorithms and compare clusters by introducing meta-clusters [10]. They defined the family-matching problems on an intersection graph.…”
Section: Clusteringmentioning
confidence: 99%