2008
DOI: 10.1007/s10791-008-9066-8
|View full text |Cite
|
Sign up to set email alerts
|

A comparison of extrinsic clustering evaluation metrics based on formal constraints

Abstract: There is a wide set of evaluation metrics available to compare the quality of text clustering algorithms. In this article, we define a few intuitive formal constraints on such metrics which shed light on which aspects of the quality of a clustering are captured by different metric families. These formal constraints are validated in an experiment involving human assessments, and compared with other constraints proposed in the literature. Our analysis of a wide range of metrics shows that only BCubed satisfies a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
200
0
13

Year Published

2011
2011
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 595 publications
(234 citation statements)
references
References 11 publications
(9 reference statements)
0
200
0
13
Order By: Relevance
“…We use argument labels of Hasan and Ng (2014) as target clusters. As noted by Amigó et al (2009), external cluster evaluation is a non-trivial task and there is no consensus on the best approach. We therefore chose to use two established, but rather different measures: the Adjusted Rand Index (ARI) (Hubert and Arabie, 1985) and the information-theoretic Vmeasure (Rosenberg and Hirschberg, 2007).…”
Section: Analysis 1: Clustering Modelsmentioning
confidence: 99%
“…We use argument labels of Hasan and Ng (2014) as target clusters. As noted by Amigó et al (2009), external cluster evaluation is a non-trivial task and there is no consensus on the best approach. We therefore chose to use two established, but rather different measures: the Adjusted Rand Index (ARI) (Hubert and Arabie, 1985) and the information-theoretic Vmeasure (Rosenberg and Hirschberg, 2007).…”
Section: Analysis 1: Clustering Modelsmentioning
confidence: 99%
“…However, there are many ways to evaluate clustering quality. Amigó et al (2009) propose a set of criteria which a clustering evaluation metric should satisfy, and demonstrate that most popular metrics fail to satisfy at least one of these criteria. However, they prove that all criteria are satisfied by the BCubed metric, which we therefore adopt.…”
Section: Detecting Similar Languagesmentioning
confidence: 99%
“…There exist a number of different metrics for evaluating cluster quality, including Precision and Recall, Normalized Mutual Information, F-score, B-cubed, et cetera [24]. We describe the one we chose below, which met our desire for a single number that summarizes the essentials and allows us to seamlessly compare performance across all algorithms and their parameters.…”
Section: Evaluation Protocolmentioning
confidence: 99%