Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019) 2019
DOI: 10.18653/v1/s19-1027
|View full text |Cite
|
Sign up to set email alerts
|

HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning

Abstract: Large crowdsourced datasets are widely used for training and evaluating neural models on natural language inference (NLI). Despite these efforts, neural models have a hard time capturing logical inferences, including those licensed by phrase replacements, socalled monotonicity reasoning. Since no large dataset has been developed for monotonicity reasoning, it is still unclear whether the main obstacle is the size of datasets or the model architectures themselves. To investigate this issue, we introduce a new d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
47
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 41 publications
(53 citation statements)
references
References 19 publications
(18 reference statements)
3
47
0
Order By: Relevance
“…These two figures show that, as the size of the upward training set increased, BERT performed better on upward inferences but worse on downward inferences, and vice versa. Previous work using HELP (Yanaka et al, 2019) reported that the BERT trained with MultiNLI and HELP containing both upward and downward inferences improved accuracy on both directions of monotonicity. MultiNLI rarely comes from downward inferences (see Section 4.3), and its size is large enough to be immune to the side-effects of downward inference examples in HELP.…”
Section: Effects Of Data Augmentationmentioning
confidence: 86%
See 2 more Smart Citations
“…These two figures show that, as the size of the upward training set increased, BERT performed better on upward inferences but worse on downward inferences, and vice versa. Previous work using HELP (Yanaka et al, 2019) reported that the BERT trained with MultiNLI and HELP containing both upward and downward inferences improved accuracy on both directions of monotonicity. MultiNLI rarely comes from downward inferences (see Section 4.3), and its size is large enough to be immune to the side-effects of downward inference examples in HELP.…”
Section: Effects Of Data Augmentationmentioning
confidence: 86%
“…To explore whether the performance of models on monotonicity reasoning depends on the training set or the model themselves, we conducted further analysis performed by data augmentation with the automatically generated monotonicity dataset HELP (Yanaka et al, 2019). HELP contains 36K monotonicity inference examples (7,784 upward examples, 21,192 downward examples, and 1,105 non-monotone examples).…”
Section: Data Augmentation For Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…We evaluate and analyze the proposed model on the monotonicity subset of Semantic Fragments (Richardson et al, 2020), HELP (Yanaka et al, 2019b) and MED (Yanaka et al, 2019a). We also extend MED to generate a dataset to help evaluate 2-hop inference.…”
Section: Relationmentioning
confidence: 99%
“…Data: We use three datasets that are designed for studying monotonicity based reasoning, i.e., HELP (Yanaka et al, 2019b), MED (Yanaka et al, 2019a), and the monotonicity subset of Semantic Fragments (Richardson et al, 2020). The HELP dataset has 35,891 inference pairs, which are automatically generated by conducting lexical substitution or deletion on one sentence to obtain the other, given natural logic polarity information of each word token and syntactic structure of sentences.…”
Section: Setupmentioning
confidence: 99%