2021
DOI: 10.1007/978-3-030-92273-3_35
|View full text |Cite
|
Sign up to set email alerts
|

TextCut: A Multi-region Replacement Data Augmentation Approach for Text Imbalance Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…Mixup [17], originally presented as a data augmentation method based on mixing in computer vision, has potential applications in enhancing the robustness [42][43][44][45] and security of deep learning models against attacks [46][47][48][49][50][51]. In natural language processing [52][53][54][55][56], Guo et al [19] presented two strategies for applying the mixup model to sentence classification: word and sentence embeddings. TMix [4] mixes two samples in hidden spaces.…”
Section: Data Augmentationmentioning
confidence: 99%
“…Mixup [17], originally presented as a data augmentation method based on mixing in computer vision, has potential applications in enhancing the robustness [42][43][44][45] and security of deep learning models against attacks [46][47][48][49][50][51]. In natural language processing [52][53][54][55][56], Guo et al [19] presented two strategies for applying the mixup model to sentence classification: word and sentence embeddings. TMix [4] mixes two samples in hidden spaces.…”
Section: Data Augmentationmentioning
confidence: 99%
“…InsGen [42] improved the data-efficiency of GANs by introducing instance discrimination tasks to the discriminator. APA [43] adaptively augmented the training data with generated data, sharing similar motivation but different implementation with us. MaskedGAN [44] proposed to mask images from both the spatial and spectral domains to prevent the over-fitting of discriminator.…”
Section: B Gans On Limited Training Datamentioning
confidence: 99%
“…Text augmentation generates new natural language instances of minority classes, ranging from simple string-based manipulations such as synonym replacements to Transformer-based generation. Easy Data Augmentation (EDA, Wei and Zou, 2019), which uses dictionary-based synonym replacements, random insertion, random swap, and random deletion, has been shown to work well in class-imbalanced settings (Jiang et al, 2021;Jang et al, 2021;Juuti et al, 2020). Juuti et al (2020) generate new minority class instances for English binary text classification using EDA and embedding-based synonym replacements, and by adding a random majority class sentence to a minority class document.…”
Section: Data Augmentationmentioning
confidence: 99%
“…Which categories matter is highly task-specific and may even depend on the intended downstream use. Developing methods that improve model performance in imbalanced data settings has been an active area for decades (e.g., Bruzzone and Serpico, 1997;Japkowicz et al, 2000;Estabrooks and Japkowicz, 2001;Park and Zhang, 2002;Tan, 2005), and is recently gaining momentum in the context of maturing neural approaches (e.g., Buda et al, 2018;Kang et al, 2020;Yang et al, 2020;Jiang et al, 2021;Spangher et al, 2021). The problem is exacerbated when classes overlap in the feature space (Lin et al, 2019;Tian et al, 2020).…”
Section: Introductionmentioning
confidence: 99%