MixBoost: Synthetic Oversampling with Boosted Mixup for Handling Extreme Imbalance

Kabra, Anubha; Chopra, Ayush; Puri, Nikaash; Badjatiya, Pinkesh; Verma, Shailendra Kumar; Gupta, Piyush; Balaji, K.

doi:10.48550/arxiv.2009.01571

Cited by 4 publications

(4 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…), as data augmentation methods, have not only achieved notable success in a wide range of machine learning problems such as supervised learning [8], semi-supervised learning [54,55], adversarial learning [56], but also adapted to different data forms such as images [57], texts [58,59], graphs [60], and speech [61]. Notably, to alleviate the problem of class imbalance in the dataset, a series of methods [9,10,62] employ Mixup to augment the data. Despite this, there has not been any research on using MixUp to solve the class imbalance problem in hierarchical multi-label classification.…”

Section: Mixupmentioning

confidence: 99%

Hierarchical MixUp Multi-label Classification with Imbalanced Interdisciplinary Research Proposals

Xiao¹,

Wang²,

Qiao³

et al. 2022

Preprint

View full text Add to dashboard Cite

Funding agencies are largely relied on a topic matching between domain experts and research proposals to assign proposal reviewers. As proposals are increasingly interdisciplinary, it is challenging to profile the interdisciplinary nature of a proposal, and, thereafter, find expert reviewers with an appropriate set of expertise. An essential step in solving this challenge is to accurately model and classify the interdisciplinary labels of a proposal. Existing methodological and application-related literature, such as textual classification and proposal classification, are insufficient in jointly addressing the three key unique issues introduced by interdisciplinary proposal data: 1) the hierarchical structure of discipline labels of a proposal from coarse-grain to fine-grain, e.g., from information science to AI to fundamentals of AI. 2) the heterogeneous semantics of various main textual parts that play different roles in a proposal; 3) the number of proposals is imbalanced between non-interdisciplinary and interdisciplinary research. Can we simultaneously address the three issues in understanding the proposal's interdisciplinary nature? In response to this question, we propose a hierarchical mixup multiple-label classification framework, which we called H-MixUp. H-MixUp leverages

show abstract

Section: Mixupmentioning

confidence: 99%

Hierarchical MixUp Multi-label Classification with Imbalanced Interdisciplinary Research Proposals

Xiao¹,

Wang²,

Qiao³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…but also adapted to different data forms such as images [5], texts [35,38], graphs [32], and speech [44]. Notably, to alleviate the problem of class imbalance in the dataset, a series of methods [6,7,9,16,20] employ Mixup to augment the data. Despite this, there has not been any research on using MixUp to solve the class imbalance problem in hierarchical multi-label classification…”

Section: Mixupmentioning

confidence: 99%

Clustering research institutes based on disciplinary layout: An empirical study of Chinese Academy of Sciences

Yang

2018

Procedia Computer Science

View full text Add to dashboard Cite

“…ReMix [6] mixes up by keeping the minority class label, instead of mixing up the labels. Similarly, MixBoost [14] attempts to combine active learning with MixUp to select which training samples to mix from each category, adding an extra complexity layer to the sampling process. Another popular technique that is related to our approach is SMOTE [5].…”

Section: Introductionmentioning

confidence: 99%

Balanced-MixUp for Highly Imbalanced Medical Image Classification

Galdrán

Carneiro

Ballester

2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Highly imbalanced datasets are ubiquitous in medical image classification problems. In such problems, it is often the case that rare classes associated to less prevalent diseases are severely under-represented in labeled databases, typically resulting in poor performance of machine learning algorithms due to overfitting in the learning process. In this paper, we propose a novel mechanism for sampling training data based on the popular MixUp regularization technique, which we refer to as Balanced-MixUp. In short, Balanced-MixUp simultaneously performs regular (i.e., instance-based) and balanced (i.e., class-based) sampling of the training data. The resulting two sets of samples are then mixed-up to create a more balanced training distribution from which a neural network can effectively learn without incurring in heavily under-fitting the minority classes. We experiment with a highly imbalanced dataset of retinal images (55K samples, 5 classes) and a long-tail dataset of gastro-intestinal video frames (10K images, 23 classes), using two CNNs of varying representation capabilities. Experimental results demonstrate that applying Balanced-MixUp outperforms other conventional sampling schemes and loss functions specifically designed to deal with imbalanced data. Code is released at https://github.com/agaldran/balanced_mixup .

show abstract

MixBoost: Synthetic Oversampling with Boosted Mixup for Handling Extreme Imbalance

Cited by 4 publications

References 22 publications

Hierarchical MixUp Multi-label Classification with Imbalanced Interdisciplinary Research Proposals

Hierarchical MixUp Multi-label Classification with Imbalanced Interdisciplinary Research Proposals

Clustering research institutes based on disciplinary layout: An empirical study of Chinese Academy of Sciences

Balanced-MixUp for Highly Imbalanced Medical Image Classification

Contact Info

Product

Resources

About