2021
DOI: 10.48550/arxiv.2110.01852
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Data Augmentation Approaches in Natural Language Processing: A Survey

Bohan Li,
Yutai Hou,
Wanxiang Che

Abstract: As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in many tasks. One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data. In this survey, we frame DA methods into three categories based on the diversity of augmented data, inc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 78 publications
(186 reference statements)
0
7
0
Order By: Relevance
“…The noising-based data augmentation methods can not only improve the translation quality of the model, but also improve its robustness [Miyato et al, 2017;Li et al, 2021]. To test robustness on noisy inputs, similarly to Cheng et al…”
Section: Results To Noisy Inputsmentioning
confidence: 99%
See 2 more Smart Citations
“…The noising-based data augmentation methods can not only improve the translation quality of the model, but also improve its robustness [Miyato et al, 2017;Li et al, 2021]. To test robustness on noisy inputs, similarly to Cheng et al…”
Section: Results To Noisy Inputsmentioning
confidence: 99%
“…Data augmentation are techniques used to increase the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data [Li et al, 2021]. These methods can significantly boost the accuracy of deep learning methods.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Since the data format varies between different downstream tasks, there is still a gap for data processing methods between tasks. Therefore, currently, no universal method is effective for all NLG tasks [96]. Data pre-processing might result in grammatical errors or semantic transformation between the original and processed data, which can negatively affect the performance of generation.…”
Section: Future Directions In Mitigation Methodsmentioning
confidence: 99%
“…Data augmentation (DA), a technique for increasing the number of data, is applied in many deep learning applications, for example, in image [18] including major convolutional neural network architectures such as [12] and successors, speech [17], and natural language processing [13]. Despite the various reports that demonstrate the effectiveness of DA, the working mechanism is not clear.…”
Section: Background and Motivationmentioning
confidence: 99%