Proceedings of the Third Workshop on Argument Mining (ArgMining2016) 2016
DOI: 10.18653/v1/w16-2805
|View full text |Cite
|
Sign up to set email alerts
|

The CASS Technique for Evaluating the Performance of Argument Mining

Abstract: Argument mining integrates many distinct computational linguistics tasks, and as a result, reporting agreement between annotators or between automated output and gold standard is particularly challenging. More worrying for the field, agreement and performance are also reported in a wide variety of different ways, making comparison between approaches difficult. To solve this problem, we propose the CASS technique for combining metrics covering different parts of the argument mining task. CASS delivers a justifi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 25 publications
0
8
0
Order By: Relevance
“…The full annotation guidelines have been validated by calculating the inter-annotator agreement on a 11.3% sample, resulting in a Cohen (1960)'s of 0.610, and a CASS (Duthie et al 2016) of 0.752-both indicating substantial agreement according to Landis and Koch (1977)'s standard interpretation of the kappa metric. The resulting annotated US2016 corpus is freely available online at http://corpo ra.aifdb .org/ US201 6.…”
Section: The Us2016 Corpusmentioning
confidence: 99%
“…The full annotation guidelines have been validated by calculating the inter-annotator agreement on a 11.3% sample, resulting in a Cohen (1960)'s of 0.610, and a CASS (Duthie et al 2016) of 0.752-both indicating substantial agreement according to Landis and Koch (1977)'s standard interpretation of the kappa metric. The resulting annotated US2016 corpus is freely available online at http://corpo ra.aifdb .org/ US201 6.…”
Section: The Us2016 Corpusmentioning
confidence: 99%
“…While we prefer Cohen's j metric over percentage agreement, because it accounts for chance agreements between annotators, it has the drawback that errors can be passed from text segmentation-a non-fixed task-to identifying relations, thus not providing a comprehensive agreement score. Duthie et al (2016b) introduce the Combined Argument Similarity Score j (CASS-j) aimed at overcoming this by calculating intermediate agreement scores for the composite tasks of text segmentation, annotation of dialogical relations, and annotation of propositional relations, before combining the detailed calculations into an overall CASS-j score (while still accounting for chance agreements.) In Table 2, we include the pairwise CASS-j scores, resulting in an overall CASS-j for US2016 of 0.752.…”
Section: Validationmentioning
confidence: 99%
“…For the inter-annotator agreement, a systematic sample of 10% of the corpus was extracted (by selecting every 10th argument map from the corpus) and annotated by the second annotator. The samples were compared using the Combined Argument Similarity Score (CASS) technique ( [Duthie et al 2016], resulting in Cohen's κ = 0.92. 8 The number of occurrences for each label is given in Table II.…”
Section: The Erulemaking Debate Corpusmentioning
confidence: 99%