Proceedings of the 26th Annual International Conference on Machine Learning 2009
DOI: 10.1145/1553374.1553511
|View full text |Cite
|
Sign up to set email alerts
|

Information theoretic measures for clusterings comparison

Abstract: Information theoretic based measures form a fundamental class of similarity measures for comparing clusterings, beside the class of pair-counting based and set-matching based measures. In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison. We observe that the baseline for such measures, i.e. average value between random partitions of a data set, does not take on a constant value, and tends to have larger variation when the ratio bet… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
452
0
3

Year Published

2011
2011
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 737 publications
(533 citation statements)
references
References 11 publications
3
452
0
3
Order By: Relevance
“…Unlike classifiers, clustering algorithms have no notion of candidate labels, so cluster assignments are evaluated against the ground truth authors with measures based on cluster agreement: whether (a) programs by the same author are assigned to the same cluster, and (b) programs by different authors are assigned to different clusters. We computed several common measures of cluster agreement, including Adjusted Mutual Information (AMI), Normalized Mutual Information (NMI), and the Adjusted Rand Index (ARI); we prefer AMI because it is stable across different numbers of clusters, easing comparison of different data sets [19]. All of the measures we use take values in the range [0, 1], where higher scores indicate better cluster agreement.…”
Section: Clusteringmentioning
confidence: 99%
“…Unlike classifiers, clustering algorithms have no notion of candidate labels, so cluster assignments are evaluated against the ground truth authors with measures based on cluster agreement: whether (a) programs by the same author are assigned to the same cluster, and (b) programs by different authors are assigned to different clusters. We computed several common measures of cluster agreement, including Adjusted Mutual Information (AMI), Normalized Mutual Information (NMI), and the Adjusted Rand Index (ARI); we prefer AMI because it is stable across different numbers of clusters, easing comparison of different data sets [19]. All of the measures we use take values in the range [0, 1], where higher scores indicate better cluster agreement.…”
Section: Clusteringmentioning
confidence: 99%
“…7 We quantify Sim(E prof , GT prof ) and Prof A by relying on two commonly used entropy based distance metrics, namely: the Normalized Mutual Information (NMI) and the Adjusted Mutual Information, (AMI). NMI assesses the similarity of two groupings of the same items (in our case, E prof and GT prof ), and takes higher values (1) the more identical the groupings are [19,20]. On the other hand, AMI assesses the advantage of A in winning the ProfInd game.…”
Section: Quantifying Privacy In Bitcoinmentioning
confidence: 99%
“…On the other hand, AMI assesses the advantage of A in winning the ProfInd game. More specifically, given the two groupings E prof and GT prof AMI approaches 0 when E prof is close to random assignment of addresses/transactions to groups, i.e., E R prof , and is 1 when E prof matches GT prof [19,20].…”
Section: Quantifying Privacy In Bitcoinmentioning
confidence: 99%
“…As long as the presence of drift is indecisive, no examples are forgotten, thus incrementally increasing the window size. According to [15], this window adjustment strategy may efficiently detect radical changes in the underlying concept, subject to a relatively low rate of change. The FLORA algorithms also assume a limited rate of data arrival, since they process one example at a time.…”
Section: State Of the Artmentioning
confidence: 99%