Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1670
|View full text |Cite
|
Sign up to set email alerts
|

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

Abstract: We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50% of the available … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

7
845
1
3

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 1,092 publications
(858 citation statements)
references
References 24 publications
7
845
1
3
Order By: Relevance
“…Although there are many augmentation methods for images, AutoAugment [29] is proposed to automatically search for augmentation policies based on the dataset. In addition to images, augmentation methods such as synonym replacement, random insertion, random swap, and random deletion are used for text classification [30], where the same accuracy as normal in all training data is achieved when only half of the training data is available. For speech recognition tasks, training audio is augmented by changing the audio speed [31], warping features, masking blocks of frequency channels, and masking blocks of time steps [32].…”
Section: B Data Augmentation In Deep Learningmentioning
confidence: 99%
“…Although there are many augmentation methods for images, AutoAugment [29] is proposed to automatically search for augmentation policies based on the dataset. In addition to images, augmentation methods such as synonym replacement, random insertion, random swap, and random deletion are used for text classification [30], where the same accuracy as normal in all training data is achieved when only half of the training data is available. For speech recognition tasks, training audio is augmented by changing the audio speed [31], warping features, masking blocks of frequency channels, and masking blocks of time steps [32].…”
Section: B Data Augmentation In Deep Learningmentioning
confidence: 99%
“…Left-in datasets were merged and randomly split again into training and validation batches in a 9:1-ratio, resulting in~3300 and 300 sentence pairs, respectively. The source sentences were augmented with random swap and random deletion operations, as described by [26] to improve model generalization further. As baseline models for predicting navigation steps at class-level ( Fig.…”
Section: Model Training and Baselinementioning
confidence: 99%
“…Another solution is to artificially augment the current dataset. This is a good practice method when working with image data [24]. Data augmentation involves different operations such as scaling, rotation, translation, flipping, resizing, adding noise, perspective transform, etc.…”
Section: Data Augmentationmentioning
confidence: 99%