2020
DOI: 10.48550/arxiv.2011.01549
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 38 publications
0
12
0
Order By: Relevance
“…Language modeling has already been used as an augmentation method to generate labeled and unlabeled examples for NER in DAGA (Ding et al, 2020). However, our taggers overperform the taggers presented on the gold standard by 30 points at size 1000 and 9 points at full size.…”
Section: Introductionmentioning
confidence: 93%
See 1 more Smart Citation
“…Language modeling has already been used as an augmentation method to generate labeled and unlabeled examples for NER in DAGA (Ding et al, 2020). However, our taggers overperform the taggers presented on the gold standard by 30 points at size 1000 and 9 points at full size.…”
Section: Introductionmentioning
confidence: 93%
“…However, in tagging, paraphrasing using back-translation (Neuraz et al, 2018) is not bringing significant improvements. Recent works show that using language models learned on the training data to generate labeled and unlabeled examples can bring improvements (Ding et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Similarly to (Zhou et al, 2019;Ding et al, 2020), we simulate a low resource setting by randomly sampling tiny subsets of the training data. Since our focus is to measure the contextual learning ability of models, we first selected sentences of CONLL training data that contain at least one entity followed or preceded by 3 non-entity words.…”
Section: Low Resource Settingmentioning
confidence: 99%
“…Ding et al [163] introduced generative language model-based augmentation approach using rnnlm for low-resource tagging tasks. This sentence-level augmentation approach linearized labeled sentences before training the language model to learn the context and distribution entity words.…”
Section: Data Augmentation For Nermentioning
confidence: 99%
“…Cross-domain augmentation approach [165] was explored to leverage data from high resource domains and apply learned linguistic patterns such as structure, style and noise on the low resource domains. In this feature-based augmentation approach, a linearized sentence pair [163] from source and target domain ,is used as an input to the autoencoder model. The model performs "word-by-word" denoising reconstruction followed by detransforming reconstruction.…”
Section: Data Augmentation For Nermentioning
confidence: 99%