Proceedings of the 24th Conference on Computational Natural Language Learning 2020
DOI: 10.18653/v1/2020.conll-1.4
|View full text |Cite
|
Sign up to set email alerts
|

TaxiNLI: Taking a Ride up the NLU Hill

Abstract: Pre-trained Transformer-based neural architectures have consistently achieved state-of-theart performance in the Natural Language Inference (NLI) task. Since NLI examples encompass a variety of linguistic, logical, and reasoning phenomena, it remains unclear as to which specific concepts are learnt by the trained systems and where they can achieve strong generalization. To investigate this question, we propose a taxonomic hierarchy of categories that are relevant for the NLI task. We introduce TAXINLI, a new d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
23
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(26 citation statements)
references
References 32 publications
0
23
0
Order By: Relevance
“…This ability to analyze the specific kinds of reasoning transformers have become proficient in is a clear advantage psychometrics have over typical NLP evaluations. The NLP community is becoming increasingly aware of the need to construct more fine-grained evaluation benchmarks (Wang et al, 2018;Joshi et al, 2020b), and we believe our work complements these efforts nicely.…”
Section: Discussionmentioning
confidence: 80%
See 1 more Smart Citation
“…This ability to analyze the specific kinds of reasoning transformers have become proficient in is a clear advantage psychometrics have over typical NLP evaluations. The NLP community is becoming increasingly aware of the need to construct more fine-grained evaluation benchmarks (Wang et al, 2018;Joshi et al, 2020b), and we believe our work complements these efforts nicely.…”
Section: Discussionmentioning
confidence: 80%
“…Besides the GLUE diagnostic, other taxonomies have been proposed, such as TaxiNLI (Joshi et al, 2020b). Although TaxiNLI includes some types of reasoning which have no clear analogue in GLUE, many of their categories are quite similar.…”
Section: Related Workmentioning
confidence: 99%
“…Additionally, it might consist of alternate "explanations", features correlated with the task label in the dataset while not being taskrelevant, which models can exploit to give the impression of good performance at the task itself. Two analysis methods have emerged to address this limitation: 1) Diagnostic examples, where a small number of samples in a test set are annotated with linguistic phenomena of interest, and task accuracy is reported on these samples (Williams et al, 2018;Joshi et al, 2020). However, it is difficult to determine if models perform well on diagnostic examples because they actually learn the linguistic competency, or if they exploit spurious correlations in the data Gururangan et al, 2018;Poliak et al, 2018).…”
Section: Background and Related Workmentioning
confidence: 99%
“…Indeed, the fact that pretrained transformers can be used to create meaningful clusters has been shown in other recent works (c.f. Aharoni and Goldberg (2020); Joshi et al (2020)).…”
Section: Drecamentioning
confidence: 99%