Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.556
|View full text |Cite
|
Sign up to set email alerts
|

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity

Abstract: When primed with only a handful of training samples, very large, pretrained language models such as GPT-3 have shown competitive results when compared to fully-supervised, finetuned, large, pretrained language models. We demonstrate that the order in which the samples are provided can make the difference between near state-of-the-art and random guess performance: essentially some permutations are "fantastic" and some not. We analyse this phenomenon in detail, establishing that: it is present across model sizes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

5
170
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 192 publications
(175 citation statements)
references
References 21 publications
5
170
0
Order By: Relevance
“…Prompt construction requires a non-trivial combinatorial search over the prompt's wording, whether to include training examples, and how to convert LM probabilities to class predictions. As a consequence, prompts are either designed using human intuition that is hard to replicate and apply in a principled manner (Perez et al, 2021), or using automated methods (Shin et al, 2020;Gao et al, 2021;Lu et al, 2021). These methods search for elements such as: (1) the text of the pattern, (2) the tokens in the verbalizers, and (3) whether and how training examples are prepended before the test input.…”
Section: Constructing the Promptmentioning
confidence: 99%
See 2 more Smart Citations
“…Prompt construction requires a non-trivial combinatorial search over the prompt's wording, whether to include training examples, and how to convert LM probabilities to class predictions. As a consequence, prompts are either designed using human intuition that is hard to replicate and apply in a principled manner (Perez et al, 2021), or using automated methods (Shin et al, 2020;Gao et al, 2021;Lu et al, 2021). These methods search for elements such as: (1) the text of the pattern, (2) the tokens in the verbalizers, and (3) whether and how training examples are prepended before the test input.…”
Section: Constructing the Promptmentioning
confidence: 99%
“…Thus far, we have shown that prompt-based finetuning can simplify prompt engineering at the cost of memory inefficiency-a new set of parameters must be learned for each task. This is in contrast to in-context learning, which holds all model weights fixed but is heavily influenced by small prompt modifications (Zhao et al, 2021;Lu et al, 2021). In this section, we investigate how to achieve both memory efficiency and simple prompts.…”
Section: Achieving Simplicity and Efficiencymentioning
confidence: 99%
See 1 more Smart Citation
“…Our method provides a way to collect large amounts of training data using a small set of labeled seed examples, and allows for more direct control over what the model learns compared to relatively brittle prompts (Lu et al, 2021). Yet, unlike model Summary: The Scottish city of Edinburgh is looking to crack down on so-called "silent disco" walking tours as residents complain they make too much noise.…”
Section: Introductionmentioning
confidence: 99%
“…Notably, pre-trained language models (PLMs) have learned a substantial amount of in-depth knowledge from data, and have archived tremendous promise in few-shot/zero-shot learning ability with the natural language prompts [12,48,54]. However, Recent studies [35,37,56] observe that prompt learning with PLMs usually generalizes unstably in an extremely low-resource setting or emerging domains. One potential reason is that, it is non-trivial for parametric models to learn rare or hard patterns well with rote memorization, thus, resulting in inefficient generalizable performance.…”
Section: Introductionmentioning
confidence: 99%