Findings of the Association for Computational Linguistics: EMNLP 2021 2021
DOI: 10.18653/v1/2021.findings-emnlp.354
|View full text |Cite
|
Sign up to set email alerts
|

Want To Reduce Labeling Cost? GPT-3 Can Help

Abstract: Data annotation is a time-consuming and labor-intensive process for many NLP tasks. Although there exist various methods to produce pseudo data labels, they are often taskspecific and require a decent amount of labeled data to start with. Recently, the immense language model GPT-3 with 175 billion parameters has achieved tremendous improvement across many few-shot learning tasks. In this paper, we explore ways to leverage GPT-3 as a low-cost data labeler to train other models. We find that, to make the downstr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
45
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 65 publications
(46 citation statements)
references
References 8 publications
1
45
0
Order By: Relevance
“…Potential impact on annotation campaigns. The performance obtained by the proposed approach highlight how in-context learning techniques may be used side-by-side with human experts within annotation campaigns [31].In general, the annotation of natural language texts describing business process models is a complicated task to be performed from scratch. A potential impact of approaches like the ones discussed in this paper is to support the annotation task by providing candidate annotations related to the process elements for which the performance are acceptable, e.g.…”
Section: Discussionmentioning
confidence: 98%
“…Potential impact on annotation campaigns. The performance obtained by the proposed approach highlight how in-context learning techniques may be used side-by-side with human experts within annotation campaigns [31].In general, the annotation of natural language texts describing business process models is a complicated task to be performed from scratch. A potential impact of approaches like the ones discussed in this paper is to support the annotation task by providing candidate annotations related to the process elements for which the performance are acceptable, e.g.…”
Section: Discussionmentioning
confidence: 98%
“…In terms of fixing bugs, fully automatic data augmentation with LMs (Yoo et al, 2021;Wang et al, 2021) cannot incorporate human "specification" beyond already existing data, nor debug phenomena that is very far from the existing data. On the other hand, general purpose or contrastive counterfactuals have shown mixed or marginally positive results (Huang et al, 2020; similar to what we observed in Section 3.2, except when large quantities of data are gathered (Nie et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Out of these, only crowdsourcing can potentially fix bugs when enough data is gathered. On the other hand, fully automated approaches such as perturbations (Belinkov and Bisk, 2018;Prabhakaran et al, 2019), automatic adversarial examples (Ribeiro et al, 2018), and unguided data augmentation (Yoo et al, 2021;Wang et al, 2021) are severely restricted to specific kinds of problems (e.g. Ribeiro et al (2018) only deal with inconsistent predictions on paraphrases).…”
Section: Introductionmentioning
confidence: 99%
“…Another line of work uses the outputs from a prompted language model as weak labels, as we do in this work. Wang et al (2021) propose to train smaller models on labels from GPT-3 to reduce annotation cost, but they train from individual, uncalibrated prompts and do not attempt to refine the prompt model alongside the smaller model. fine-tune a separate RoBERTa model for each prompt using a small amount of labeled data.…”
Section: Related Workmentioning
confidence: 99%