Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.301
|View full text |Cite
|
Sign up to set email alerts
|

Let’s Stop Incorrect Comparisons in End-to-end Relation Extraction!

Abstract: Despite efforts to distinguish three different evaluation setups (Bekoulis et al., 2018a,b), numerous end-to-end Relation Extraction (RE) articles present unreliable performance comparison to previous work. In this paper, we first identify several patterns of invalid comparisons in published papers and describe them to avoid their propagation. We then propose a small empirical study to quantify the most common mistake's impact and evaluate it leads to overestimating the final RE performance by around 5% on ACE… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 21 publications
(17 citation statements)
references
References 35 publications
0
17
0
Order By: Relevance
“…Evaluation Following suggestions in (Taillé et al, 2020), we evaluate Precision (P), Recall (R), and F1 scores with micro-averaging and adopt the Strict Evaluation criterion. Specifically, a predicted entity is correct if its type and boundaries are correct, and a predicted relation is correct if its relation type is correct, as well as the boundaries and types of two argument entities are correct.…”
Section: Methodsmentioning
confidence: 99%
“…Evaluation Following suggestions in (Taillé et al, 2020), we evaluate Precision (P), Recall (R), and F1 scores with micro-averaging and adopt the Strict Evaluation criterion. Specifically, a predicted entity is correct if its type and boundaries are correct, and a predicted relation is correct if its relation type is correct, as well as the boundaries and types of two argument entities are correct.…”
Section: Methodsmentioning
confidence: 99%
“…(2) strict evaluation (Rel+): in addition to what is required in the boundaries evaluation, predicted entity types also must be correct. More discussion of the evaluation settings can be found in Bekoulis et al (2018); Taillé et al (2020).…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…The term Relation Extraction is often used in the literature for different tasks and setups in the literature (Taillé et al, 2020). For clarity, we refer to Relation Extraction (RE) as the task of extracting triplets of relations between entities from raw text, with no given entity spans, usually also called endto-end Relation Extraction.…”
Section: Relation Extractionmentioning
confidence: 99%
“…While the aforementioned work highlights the relevance of Relation Extraction as a task, the lack of consistent baselines or a cohesive task definition has led to discrepancies in the use of datasets and the way models have been evaluated. Taillé et al (2020) explain the different issues in-so-far, and also make an attempt to unify RE evaluation and perform a fair comparison between systems.…”
Section: Relation Extractionmentioning
confidence: 99%