2022
DOI: 10.48550/arxiv.2201.05955
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation

Abstract: A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We introduce a novel paradigm for dataset creation based on human and machine collaboration, which brings together the generative strength of language models and the evaluative strength of humans. Starting with an existing dataset, MultiNLI, our approach uses dataset cartography to automatically identify examples that demonstrate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 16 publications
(27 citation statements)
references
References 16 publications
0
27
0
Order By: Relevance
“…2 It is therefore tempting to combine active learning with the use of a foundation model, to improve sample efficiency beyond what either method can do alone. While prior work has attempted to leverage language models to automate part of the labeling process [48,25], we are not aware of any work fine-tuning language models that has tried actively selecting points for human feedback. We seek to fill this gap in the literature in the remainder of the paper.…”
Section: Related Workmentioning
confidence: 99%
“…2 It is therefore tempting to combine active learning with the use of a foundation model, to improve sample efficiency beyond what either method can do alone. While prior work has attempted to leverage language models to automate part of the labeling process [48,25], we are not aware of any work fine-tuning language models that has tried actively selecting points for human feedback. We seek to fill this gap in the literature in the remainder of the paper.…”
Section: Related Workmentioning
confidence: 99%
“…The benchmark is comprised of 115,530 sentence pairs, which of 8,421 idioms. A recurrent challenge in crowdsourcing NLPoriented datasets at scale-level is that human writers frequently utilize repetitive patterns to fabricate examples, leading to a lack of linguistic diversity [11]. A new large-scale CIP dataset is created in this study by taking advantage of the collaboration between humans and machines.…”
Section: Tablementioning
confidence: 99%
“…The out-of-domain test set is individually collected by native Chinese crowd-workers without human and machine collaboration. The crowd-workers often take limited writing strategies to speed up the establishment of a dataset, which is harmful to the diversity of the dataset [11], [29]. The quality of out-of-domain test set can be further improved.…”
Section: Human Evaluationmentioning
confidence: 99%
“…Scalable Oversight As AI systems become more capable of generating candidate responses, an emerging line of research supervises AI systems by providing preferences over AI-generated candidates rather than providing human demonstrations (Stiennon et al, 2020;Wiegreffe et al, 2021;Askell et al, 2021;Liu et al, 2022;Ouyang et al, 2022). Therefore, to supervise AI to perform more complex tasks, it becomes increasingly important to determine human preferences over model outputs that are expensive to verify, such as full-book summaries or natural language descriptions of distributional properties (Amodei et al, 2016;Wu et al, 2021;Zhong et al, 2022).…”
Section: Related Workmentioning
confidence: 99%