2019
DOI: 10.1007/978-3-030-22747-0_7
|View full text |Cite
|
Sign up to set email alerts
|

Conditional BERT Contextual Augmentation

Abstract: We propose a novel data augmentation method for labeled sentences called conditional BERT contextual augmentation. Data augmentation methods are often applied to prevent overfitting and improve generalization of deep neural network models. Recently proposed contextual augmentation augments labeled sentences by randomly replacing words with more varied substitutions predicted by language model. BERT demonstrates that a deep bidirectional language model is more powerful than either an unidirectional language mod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
143
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 245 publications
(177 citation statements)
references
References 31 publications
1
143
0
Order By: Relevance
“…Google first proposed the BERT model, and it completely subverted the logic of training word vectors before training specific tasks in natural language processing [ 24 ]. Methods of fine-tuning the BERT model, such as extended text preprocessing and layer adjustment, have been proved to improve the results substantially [ 66 ]. Wu et al proposed a conditional BERT method, which can enhance the text classification ability of original BERT method by predicting the conditions of masked words [ 67 ].…”
Section: Problem Statementmentioning
confidence: 99%
“…Google first proposed the BERT model, and it completely subverted the logic of training word vectors before training specific tasks in natural language processing [ 24 ]. Methods of fine-tuning the BERT model, such as extended text preprocessing and layer adjustment, have been proved to improve the results substantially [ 66 ]. Wu et al proposed a conditional BERT method, which can enhance the text classification ability of original BERT method by predicting the conditions of masked words [ 67 ].…”
Section: Problem Statementmentioning
confidence: 99%
“…Augmentation is done by sampling from the returned probability distribution. Kobayashi [42] trained a Bi-Directional LSTM language model with this approach, and Wu et al [43] enhanced the approach by using BERT as an underlying model. All the mentioned methods were evaluated in this study, and the implementation was done using the NLPAug library [44] except for contextual augmentation for which the released code by the authors of [42] was used.…”
Section: Similar Word Substitution Augmentationmentioning
confidence: 99%
“…The method is replacing randomly chosen words in a sentence with a mixture of multiple related words based on a distributional representation. The contextual data augmentation studies [23,29], which are the most similar to ours, used a bidirectional LSTM-LM or MLM to change the words in a sentence using a fill-in-the-blank task. These contextual data augmentation methods achieved the state-of-the-art results on the text classification benchmark dataset.…”
Section: Data Augmentationmentioning
confidence: 99%
“…In order to alleviate the aforementioned problem, many studies take advantage of augmentation of the existing data [9,[19][20][21][22][23][24]. Data augmentation is used for small amounts of data, but also unbalanced data.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation