2019
DOI: 10.1371/journal.pone.0222916
|View full text |Cite
|
Sign up to set email alerts
|

Why Cohen’s Kappa should be avoided as performance measure in classification

Abstract: We show that Cohen’s Kappa and Matthews Correlation Coefficient (MCC), both extended and contrasted measures of performance in multi-class classification, are correlated in most situations, albeit can differ in others. Indeed, although in the symmetric case both match, we consider different unbalanced situations in which Kappa exhibits an undesired behaviour, i.e. a worse classifier gets higher Kappa score, differing qualitatively from that of MCC. The debate about the incoherence in the behaviour of Kappa rev… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
113
0
3

Year Published

2020
2020
2024
2024

Publication Types

Select...
10

Relationship

0
10

Authors

Journals

citations
Cited by 230 publications
(148 citation statements)
references
References 44 publications
2
113
0
3
Order By: Relevance
“…The current most popular and widespread metrics include Cohen’s kappa [7072]: originally developed to test inter-rater reliability, in the last decades Cohen’s kappa entered the machine learning community for comparing classifiers’ performances. Despite its popularity, in the learning context there are a number of issues causing the kappa measure to produce unreliable results (for instance, its high sensitivity to the distribution of the marginal totals [7375]), stimulating research for more reliable alternatives [76]. Due to these issues, we chose not to include Cohen’s kappa in the present comparison study.…”
Section: Introductionmentioning
confidence: 99%
“…The current most popular and widespread metrics include Cohen’s kappa [7072]: originally developed to test inter-rater reliability, in the last decades Cohen’s kappa entered the machine learning community for comparing classifiers’ performances. Despite its popularity, in the learning context there are a number of issues causing the kappa measure to produce unreliable results (for instance, its high sensitivity to the distribution of the marginal totals [7375]), stimulating research for more reliable alternatives [76]. Due to these issues, we chose not to include Cohen’s kappa in the present comparison study.…”
Section: Introductionmentioning
confidence: 99%
“…Calculated indices are all based on four [75]. Due to recent concerns about Cohen's Kappa Coefficient assessments and its undesired behavior discussed in Reference [76], the MCC metric was also calculated to ensure the validity of our evaluation. MCC values close to 1 represent perfect agreement, the value of 0 is interpreted as random prediction and the value of −1 is interpreted as complete opposite predictions for observations.…”
Section: Resultsmentioning
confidence: 99%
“…Values close to −1 and +1 indicate performance much worse than chance, and much better than chance, respectively. As some concerns have been raised regarding the use of kappa as a performance measure in classification (Delgado and Tibau 2019), we also provide Matthew's Correlation Coefficients (MCC).…”
Section: Methodsmentioning
confidence: 99%