2021
DOI: 10.48550/arxiv.2103.14453
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers

Markus Bayer,
Marc-André Kaufhold,
Björn Buchhold
et al.

Abstract: In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve classifiers by artificially created training data. In NLP, there is the challenge of establishing universal rules for text transformations which provide new linguistic patterns. In this paper, we present and evaluate a text generation method suitable to increase the perform… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 25 publications
(62 reference statements)
0
1
0
Order By: Relevance
“…However, the shortage of maintenance text data may hinder the exploitation of this approach. Therefore, a NLP augmentation strategy could be helpful (Bayer, M., Kaufhold, M.-A., Buchhold, B., Keller, M., Dallmeyer, J., and Reuter, C., 2021), although the larger the data analyzed, the greater the chance that spurious correlations dominate the results and lead to erroneous conclusions (Dima, A., Lukens, S., Hodkiewicz, M., Sexton, T., and Brundage, M. P., 2021). Alternatively, fine-tuning a bigger pre-trained language model, which has become the de facto standard for doing transfer learning in NLP, could also be advantageous (Li, J., Tang, T., Zhao, W. X., and Wen, J.-R., 2021).…”
Section: Discussionmentioning
confidence: 99%
“…However, the shortage of maintenance text data may hinder the exploitation of this approach. Therefore, a NLP augmentation strategy could be helpful (Bayer, M., Kaufhold, M.-A., Buchhold, B., Keller, M., Dallmeyer, J., and Reuter, C., 2021), although the larger the data analyzed, the greater the chance that spurious correlations dominate the results and lead to erroneous conclusions (Dima, A., Lukens, S., Hodkiewicz, M., Sexton, T., and Brundage, M. P., 2021). Alternatively, fine-tuning a bigger pre-trained language model, which has become the de facto standard for doing transfer learning in NLP, could also be advantageous (Li, J., Tang, T., Zhao, W. X., and Wen, J.-R., 2021).…”
Section: Discussionmentioning
confidence: 99%