Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.774
|View full text |Cite
|
Sign up to set email alerts
|

Uncertain Natural Language Inference

Abstract: We introduce Uncertain Natural Language Inference (UNLI), a refinement of Natural Language Inference (NLI) that shifts away from categorical labels, targeting instead the direct prediction of subjective probability assessments. We demonstrate the feasibility of collecting annotations for UNLI by relabeling a portion of the SNLI dataset under a probabilistic scale, where items even with the same categorical label differ in how likely people judge them to be true given a premise. We describe a direct scalar regr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
29
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 38 publications
(42 citation statements)
references
References 33 publications
(22 reference statements)
2
29
0
1
Order By: Relevance
“…In this case, a piece of evidence contradicts a relative clause in the claim but does not refute the entire claim. Similar problems regarding the uncertainty of NLI tasks have been pointed out in previous works (Zaenen et al, 2005;Pavlick and Kwiatkowski, 2019;Chen et al, 2020a).…”
Section: Claim Labelingsupporting
confidence: 76%
See 1 more Smart Citation
“…In this case, a piece of evidence contradicts a relative clause in the claim but does not refute the entire claim. Similar problems regarding the uncertainty of NLI tasks have been pointed out in previous works (Zaenen et al, 2005;Pavlick and Kwiatkowski, 2019;Chen et al, 2020a).…”
Section: Claim Labelingsupporting
confidence: 76%
“…However, we find that the decision between RE-FUTED and NOTENOUGHINFO can be ambiguous in many-hop claims and even the high-quality, trained annotators from Appen, instead of Mturk, cannot consistently choose the correct label from these two classes. Recent works (Pavlick and Kwiatkowski, 2019;Chen et al, 2020a) have raised concern over the uncertainty of NLI tasks with categorical labels and proposed to shift to a probabilistic scale. Since this work is mainly targeting the many-hop retrieval, we combine the REFUTED and NOTENOUGHINFO into a single class, namely NOT-SUPPORTED.…”
Section: Introductionmentioning
confidence: 99%
“…It gained tremendous popularity again 10 years later, with the release of the large-scale Stanford Natural Language Inference dataset (SNLI; Bowman et al, 2015), that facilitated training neural models, and which was followed by several other datasets in that nature (Williams et al, 2018;Nie et al, 2019). But-among other criticisms of the task-it has been shown that people generally don't agree on entailment annotations (Pavlick and Kwiatkowski, 2019), and new variants of the task suggested to shift away from categorical labels to ordinal or numeric values denoting plausibility (Zhang et al, 2017;Sakaguchi and Van Durme, 2018;Chen et al, 2020). In this paper we focus on the defeasibil-ity of textual entailments, a less well-studied phenomenon in this context.…”
Section: Background and Related Workmentioning
confidence: 99%
“…(Williams et al, 2018), JOCI (Zhang et al, 2017) and DNC (Poliak et al, 2018)). In their study, annotators had to select the degree to which a premise entails a hypothesis, on a scale (Chen et al, 2020) (instead of discrete labels). Pavlick and Kwiatkowski (2019) show that even though these datasets are reported to have high agreement scores, specific examples suffer from inherent disagreements.…”
Section: Resultsmentioning
confidence: 99%
“…In light of the low agreements on explicit modeling of the task of complement coercion, we turn to a different crowdsourcing approach which was proven successful for many linguistic phenomena -using NLI as discussed above ( §2). NLI was used to collect data for a wide range of linguistic phenomena: Paraphrase Inference, Anaphora Resolution, Numerical Reasoning, Implicatures and more (White et al, 2017;Poliak et al, 2018;Jeretic et al, 2020;Yanaka et al, 2020;Naik et al, 2018) (see Poliak (2020)). Therefore, we take a similar approach, with similar methodologies, and make use of NLI as an evaluation setup for the complement coercion phenomenon.…”
Section: Nli For Complement Coercionmentioning
confidence: 99%