Findings of the Association for Computational Linguistics: EMNLP 2020 2020
DOI: 10.18653/v1/2020.findings-emnlp.394
|View full text |Cite
|
Sign up to set email alerts
|

How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?

Abstract: Task-agnostic forms of data augmentation have proven widely effective in computer vision, even on pretrained models. In NLP similar results are reported most commonly for low data regimes, non-pretrained models, or situationally for pretrained models. In this paper we ask how effective these techniques really are when applied to pretrained transformers. Using two popular varieties of task-agnostic data augmentation (not tailored to any particular task), Easy Data Augmentation (Wei and Zou, 2019) and Back-Trans… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
46
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 49 publications
(53 citation statements)
references
References 22 publications
2
46
0
1
Order By: Relevance
“…More theoretic and experimental work is necessary to understand how approaches compare to each other and on which factors their effectiveness depends. Longpre et al (2020), for instance, hypothesized that data augmentation and pre-trained language models yield similar kind of benefits. Often, however, new techniques are just compared to similar methods and not across the range of low-resource approaches.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…More theoretic and experimental work is necessary to understand how approaches compare to each other and on which factors their effectiveness depends. Longpre et al (2020), for instance, hypothesized that data augmentation and pre-trained language models yield similar kind of benefits. Often, however, new techniques are just compared to similar methods and not across the range of low-resource approaches.…”
Section: Discussionmentioning
confidence: 99%
“…There is not yet a unified framework that allows applying data augmentation across tasks and languages. Recently, Longpre et al (2020) hypothesised that data augmentation provides the same benefits as pretraining in transformer models. However, we argue that data augmentation might be better suited to leverage the insights of linguistic or domain experts in low-resource settings when unlabeled data or hardware resources are limited.…”
Section: Data Augmentationmentioning
confidence: 99%
“…For NLI and QQP, we observed in a pilot study that randomly chosen counterfactuals may not be more effective than the same amount of additional data. We suspect that Polyjuice lacks domain knowledge and context for identifying critical perturbations, and therefore brings benefits redundant with pretraining (Longpre et al, 2020). Thus, we use the slicing functions of to find patterns of interest (e.g., prepositions in NLI), and perturb those patterns by placing [BLANK]s on the matched spans.…”
Section: Training With Counterfactualsmentioning
confidence: 99%
“…Minimal benefit for pretrained models on indomain data: With the popularization of large pretrained language models, it has recently come to light that a couple of previously effective DA techniques for certain text classification tasks in English (Wei and Zou, 2019;Sennrich et al, 2016) provide little benefit for models like BERT and RoBERTa, which already achieve high performance on indomain text classification (Longpre et al, 2020). One hypothesis for this could be that using simple DA techniques provides little benefit when finetuning large pretrained transformers on tasks for which examples are well-represented in the pretraining data, but DA methods could still be effective when finetuning on tasks for which examples are scarce or out-of-domain compared with the training data.…”
Section: Challenges and Future Directionsmentioning
confidence: 99%