2021
DOI: 10.48550/arxiv.2105.03075
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Survey of Data Augmentation Approaches for NLP

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
45
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 45 publications
(61 citation statements)
references
References 0 publications
1
45
0
1
Order By: Relevance
“…For the completeness of treatment of the subject, we address those related surveys in the following. Feng et al [14] conducted an extensive survey on data augmentation for NLP robustness. They studied various data augmentation techniques, including rule-based and modelbased techniques as strategies to robustify NLP models to adversarial attacks.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…For the completeness of treatment of the subject, we address those related surveys in the following. Feng et al [14] conducted an extensive survey on data augmentation for NLP robustness. They studied various data augmentation techniques, including rule-based and modelbased techniques as strategies to robustify NLP models to adversarial attacks.…”
Section: Related Workmentioning
confidence: 99%
“…The authors claim that they can certify the classifications of over 50% texts to any perturbation of 5 words on AGNEWS dataset, and 2 words in the SST-2 dataset (dataset-dependent). The interested reader could find more details in the comprehensive survey on data augmentation techniques, their advantages, and disadvantages in [14].…”
Section: A Data Augmentationmentioning
confidence: 99%
See 2 more Smart Citations
“…Neural network robustness naturally complements the perspective offered by brittleness as it involves the certification of a model against a wide range of attacks (Huang et al 2017). In NLP, similarly to computer vision (Akhtar and Mian 2018), the majority of works have adopted the narrow notion of robustness, in terms of invariance to minor perturbations of an input text (Gowal et al 2018;Jia et al 2019;Dong et al 2021;La Malfa et al 2020), while only a minority have contested these limitations, either implicitly (Ribeiro et al 2020) or explicitly (Morris 2020;Morris et al 2020a;Xu et al 2020), mainly due to the difficulty of automatically generating semantically involved test beds (Feng et al 2021). Although adversarial data augmentation in NLP is well established (Morris et al 2020b), robustness to semantically coherent, yet possibly diverging, examples is still in its 'adolescence' (Ribeiro, Singh, and Guestrin 2018), as many highly accurate NLP models cannot recognize cogent linguistic phenomena even on low-order tasks such as binary classification (Barnes, Øvrelid, and Velldal 2019).…”
Section: Related Workmentioning
confidence: 99%