Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.98
|View full text |Cite
|
Sign up to set email alerts
|

SetConv: A New Approach for Learning from Imbalanced Data

Abstract: For many real-world classification problems, e.g., sentiment classification, most existing machine learning methods are biased towards the majority class when the Imbalance Ratio (IR) is high. To address this problem, we propose a set convolution (SetConv) operation and an episodic training strategy to extract a single representative for each class, so that classifiers can later be trained on a balanced class distribution. We prove that our proposed algorithm is permutation-invariant despite the order of input… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
0
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 26 publications
1
0
0
Order By: Relevance
“…2) MC over-predicts majority classes in both datasets (2s and 3s for MR and 1s and 10s for IMDb) while under-predicting the others (except 2s and 3s in IMDb). These results are in line with the common observation that MC models tend to overfit on the majority classes in im- balanced datasets, which motivates the use of "oversampling" or class balancing (Buda et al, 2018;Chawla et al, 2002;Tepper et al, 2020;Gao et al, 2020). OR, in contrast, provides a better fit for MR (slightly under-predicting for 1s), but significantly under-predicts on IMDb majority classes, displaying a much flatter distribution of predictions.…”
Section: Dataset Benchmarkssupporting
confidence: 83%
“…2) MC over-predicts majority classes in both datasets (2s and 3s for MR and 1s and 10s for IMDb) while under-predicting the others (except 2s and 3s in IMDb). These results are in line with the common observation that MC models tend to overfit on the majority classes in im- balanced datasets, which motivates the use of "oversampling" or class balancing (Buda et al, 2018;Chawla et al, 2002;Tepper et al, 2020;Gao et al, 2020). OR, in contrast, provides a better fit for MR (slightly under-predicting for 1s), but significantly under-predicts on IMDb majority classes, displaying a much flatter distribution of predictions.…”
Section: Dataset Benchmarkssupporting
confidence: 83%
“…More concrete definitions, e.g., regarding the relative share up to which a class is seen as a minority class, depend on the task, dataset and labelset size. Much research focuses on improving all minority classes equally while maintaining or at least monitoring majority class performance (e.g., Huang et al, 2021;Yang et al, 2020;Spangher et al, 2021). We next discuss prototypical types of imbalance (Sec.…”
Section: Problem Definitionmentioning
confidence: 99%
“…Under imbalance, two issues arise. First, although class-specific weights have been used with BCE (e.g., Yang et al, 2020), their effect on minority classes is less clear than in the single-label case. For each instance, all classes contribute to BCE, with the labels not assigned to the instance (called negative classes) included via (1−y j ) log(1−p j ).…”
Section: Loss Functionsmentioning
confidence: 99%
See 2 more Smart Citations