Findings of the Association for Computational Linguistics: EMNLP 2020 2020
DOI: 10.18653/v1/2020.findings-emnlp.314
|View full text |Cite
|
Sign up to set email alerts
|

OCNLI: Original Chinese Natural Language Inference

Abstract: Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been limited to English due to a lack of reliable datasets for most of the world's languages. In this paper, we present the first large-scale NLI dataset (consisting of ∼56,000 annotated sentence pairs) 1 for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI). Unlike recent attempts at exte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
26
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 48 publications
(31 citation statements)
references
References 41 publications
2
26
0
Order By: Relevance
“…We verify our findings with three popular English NLI datasets-SNLI (Bowman et al, 2015), MultiNLI (Williams et al, 2018b) and ANLI (Nie et al, 2020))-and one Chinese one, OCNLI (Hu et al, 2020a). It is thus less likely that our findings result from some quirk of English or a particular tokenization strategy.…”
supporting
confidence: 77%
See 1 more Smart Citation
“…We verify our findings with three popular English NLI datasets-SNLI (Bowman et al, 2015), MultiNLI (Williams et al, 2018b) and ANLI (Nie et al, 2020))-and one Chinese one, OCNLI (Hu et al, 2020a). It is thus less likely that our findings result from some quirk of English or a particular tokenization strategy.…”
supporting
confidence: 77%
“…We train all models on MNLI, and evaluate on in-distribution (SNLI and MNLI) and out-of-distribution datasets (ANLI). We independently verify results of (a) using both our fine-tuned model using HuggingFace Transformers (Hu et al, 2020a). Bold marks the highest value per metric (red shows the model is insensitive to permutation).…”
Section: Resultsmentioning
confidence: 99%
“…Another approach brings computational linguists directly into the crowdsourcing process. This was recently demonstrated at a small scale by Hu et al (2020) with OCNLI: They show that it is possible to significantly improve data quality issues by making small interventions during the crowdsourcing process-like offering additional bonus payments for examples that avoid overused words and constructions-without significantly limiting annotators' freedom to independently construct creative examples.…”
Section: Improving Validitymentioning
confidence: 99%
“…In this writing task, we provide a context passage drawn from the Open American National Corpus (Ide and Suderman, 2006). 4 Inspired by Hu et al (2020), we ask workers to write two questions per passage with four answer choices each. We direct workers to ensure that the questions are answerable given the passage and that there is only one correct answer for each question.…”
Section: Writing Examplesmentioning
confidence: 99%