TextCut: A Multi-region Replacement Data Augmentation Approach for Text Imbalance Classification

Jiang, Wanrong; Ya, Chen; Fu, Hao; Liu, Guiquan

doi:10.1007/978-3-030-92273-3_35

Cited by 6 publications

(8 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Mixup [17], originally presented as a data augmentation method based on mixing in computer vision, has potential applications in enhancing the robustness [42][43][44][45] and security of deep learning models against attacks [46][47][48][49][50][51]. In natural language processing [52][53][54][55][56], Guo et al [19] presented two strategies for applying the mixup model to sentence classification: word and sentence embeddings. TMix [4] mixes two samples in hidden spaces.…”

Section: Data Augmentationmentioning

confidence: 99%

Machine Learning Algorithms for Fostering Innovative Education for University Students

Wang,

You,

2024

Electronics

View full text Add to dashboard Cite

Data augmentation with mixup has been proven effective in various machine learning tasks. However, previous methods primarily concentrate on generating previously unseen virtual examples using randomly selected mixed samples, which may overlook the importance of similar spatial distributions. In this work, we extend mixup and propose MbMix, a novel yet simple training approach designed for implementing mixup with memory batch augmentation. MbMix specifically selects the samples to be mixed via memory batch to guarantee that the generated samples have the same spatial distribution as the dataset samples. Conducting extensive experiments, we empirically validate that our method outperforms several mixup methods across a broad spectrum of text classification benchmarks, including sentiment classification, question type classification, and textual entailment. Of note, our proposed method achieves a 5.61% improvement compared to existing approaches on the TREC-fine benchmark. Our approach is versatile, with applications in sentiment analysis, question answering, and fake news detection, offering entrepreneurial teams and students avenues to innovate. It enables simulation and modeling for student ventures, fostering an entrepreneurial campus culture and mindset.

show abstract

Section: Data Augmentationmentioning

confidence: 99%

Machine Learning Algorithms for Fostering Innovative Education for University Students

Wang,

You,

2024

Electronics

View full text Add to dashboard Cite

show abstract

“…InsGen [42] improved the data-efficiency of GANs by introducing instance discrimination tasks to the discriminator. APA [43] adaptively augmented the training data with generated data, sharing similar motivation but different implementation with us. MaskedGAN [44] proposed to mask images from both the spatial and spectral domains to prevent the over-fitting of discriminator.…”

Section: B Gans On Limited Training Datamentioning

confidence: 99%

Regularizing Label-Augmented Generative Adversarial Networks Under Limited Data

Hou

2023

IEEE Access

View full text Add to dashboard Cite

Training generative adversarial networks (GANs) using limited training data is challenging since the original discriminator is prone to overfitting. The recently proposed label augmentation technique complements categorical data augmentation approaches for discriminator, showing improved data efficiency in training GANs but lacks a theoretical basis. In this paper, we propose a novel regularization approach for the label-augmented discriminator to further improve the data efficiency of training GANs with a theoretical basis. Specifically, the proposed regularization adaptively constrains the predictions of the label-augmented discriminator on generated data to be close to the moving averages of its historical predictions on real data, and vice versa. We theoretically establish a connection between the objective function with the proposed regularization and a f-divergence that is more robust than the previous reversed Kullback-Leibler divergence. Experimental results on various datasets and diverse architectures show the significantly improved data efficiency of our proposed method compared to state-of-the-art data-efficient GAN training approaches for training GANs under limited training data regimes.

show abstract

“…Text augmentation generates new natural language instances of minority classes, ranging from simple string-based manipulations such as synonym replacements to Transformer-based generation. Easy Data Augmentation (EDA, Wei and Zou, 2019), which uses dictionary-based synonym replacements, random insertion, random swap, and random deletion, has been shown to work well in class-imbalanced settings (Jiang et al, 2021;Jang et al, 2021;Juuti et al, 2020). Juuti et al (2020) generate new minority class instances for English binary text classification using EDA and embedding-based synonym replacements, and by adding a random majority class sentence to a minority class document.…”

Section: Data Augmentationmentioning

confidence: 99%

“…Which categories matter is highly task-specific and may even depend on the intended downstream use. Developing methods that improve model performance in imbalanced data settings has been an active area for decades (e.g., Bruzzone and Serpico, 1997;Japkowicz et al, 2000;Estabrooks and Japkowicz, 2001;Park and Zhang, 2002;Tan, 2005), and is recently gaining momentum in the context of maturing neural approaches (e.g., Buda et al, 2018;Kang et al, 2020;Yang et al, 2020;Jiang et al, 2021;Spangher et al, 2021). The problem is exacerbated when classes overlap in the feature space (Lin et al, 2019;Tian et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing

Henning,

Beluch,

Fraser

et al. 2023

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

View full text Add to dashboard Cite

Many natural language processing (NLP) tasks are naturally imbalanced, as some target categories occur much more frequently than others in the real world. In such scenarios, current NLP models tend to perform poorly on less frequent classes. Addressing class imbalance in NLP is an active research topic, yet, finding a good approach for a particular task and imbalance scenario is difficult.In this survey, the first overview on class imbalance in deep-learning based NLP, we first discuss various types of controlled and realworld class imbalance. Our survey then covers approaches that have been explicitly proposed for class-imbalanced NLP tasks or, originating in the computer vision community, have been evaluated on them. We organize the methods by whether they are based on sampling, data augmentation, choice of loss function, staged learning, or model design. Finally, we discuss open problems and how to move forward.

show abstract

TextCut: A Multi-region Replacement Data Augmentation Approach for Text Imbalance Classification

Cited by 6 publications

References 25 publications

Machine Learning Algorithms for Fostering Innovative Education for University Students

Machine Learning Algorithms for Fostering Innovative Education for University Students

Regularizing Label-Augmented Generative Adversarial Networks Under Limited Data

A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing

Contact Info

Product

Resources

About