Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems 2020
DOI: 10.18653/v1/2020.eval4nlp-1.10
|View full text |Cite
|
Sign up to set email alerts
|

A survey on Recognizing Textual Entailment as an NLP Evaluation

Abstract: Recognizing Textual Entailment (RTE) was proposed as a unified evaluation framework to compare semantic understanding of different NLP systems. In this survey paper, we provide an overview of different approaches for evaluating and understanding the reasoning capabilities of NLP systems. We then focus our discussion on RTE by highlighting prominent RTE datasets as well as advances in RTE dataset that focus on specific linguistic phenomena that can be used to evaluate NLP systems on a fine-grained level. We con… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
3

Relationship

2
8

Authors

Journals

citations
Cited by 23 publications
(12 citation statements)
references
References 62 publications
(70 reference statements)
0
7
0
Order By: Relevance
“…Our LingFeatured NLIdataset focuses on a specific linguistic phenomenon: the opposition of factivity vs. nonfactivity and the relation of these categories to semantic features such as entailment, contradiction and neutrality. We conclude that the specified datasets allow for a better specialization of ML models to narrow their scope of features to generalize (see [Poliak, 2020]). The three most important features of our dataset are as follows:…”
Section: Language Materials and Our Datasetmentioning
confidence: 85%
“…Our LingFeatured NLIdataset focuses on a specific linguistic phenomenon: the opposition of factivity vs. nonfactivity and the relation of these categories to semantic features such as entailment, contradiction and neutrality. We conclude that the specified datasets allow for a better specialization of ML models to narrow their scope of features to generalize (see [Poliak, 2020]). The three most important features of our dataset are as follows:…”
Section: Language Materials and Our Datasetmentioning
confidence: 85%
“…As mentioned in the introduction, the NLI task (Dagan et al, 2006(Dagan et al, , 2013, sometimes called Recognizing Textual Entailment (RTE), was extensively studied by the NLP community over the past several years as a semantic reasoning benchmark (see Poliak, 2020;Storks et al, 2019, for surveys). The field of fact verification (Vlachos and Riedel, 2014) also recently gained increased attention (Bekoulis et al, 2021;Kotonya and Toni, 2020;Guo et al, 2022;Zeng et al, 2021), sharing similar pair-wise semantic inference challenges, together with evidence retrieval.…”
Section: Related Workmentioning
confidence: 99%
“…We follow recent work that test for an expanded range of inference patterns in RTE systems (Bernardy and Chatzikyriakidis, 2019) by evaluating how well RTE models capture specific linguistic phenomena, such as pragmatic inferences (Jeretic et al, 2020), veridicality , and others (Pavlick and Callison-Burch, 2016;White et al, 2017;Dasgupta et al, 2018;Naik et al, 2018;Glockner et al, 2018;Kim et al, 2019;Kober et al, 2019;Richardson et al, 2020;Yanaka et al, 2020;Vashishtha et al, 2020;Poliak, 2020).…”
Section: Related Workmentioning
confidence: 99%