2020
DOI: 10.3390/app10041276
|View full text |Cite
|
Sign up to set email alerts
|

Data Sampling Methods to Deal With the Big Data Multi-Class Imbalance Problem

Abstract: The class imbalance problem has been a hot topic in the machine learning community in recent years. Nowadays, in the time of big data and deep learning, this problem remains in force. Much work has been performed to deal to the class imbalance problem, the random sampling methods (over and under sampling) being the most widely employed approaches. Moreover, sophisticated sampling methods have been developed, including the Synthetic Minority Over-sampling Technique (SMOTE), and also they have been combined with… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
40
0
7

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 79 publications
(47 citation statements)
references
References 64 publications
0
40
0
7
Order By: Relevance
“…Imbalance data remains a key challenge against classification models [15,18]. The majority of literature considered re-sampling approaches, i.e., both over-sampling and under-sampling, to alleviate degradation due to the issue of imbalanced data [1,17,19,33,37].…”
Section: Theoretical Backgroundmentioning
confidence: 99%
See 2 more Smart Citations
“…Imbalance data remains a key challenge against classification models [15,18]. The majority of literature considered re-sampling approaches, i.e., both over-sampling and under-sampling, to alleviate degradation due to the issue of imbalanced data [1,17,19,33,37].…”
Section: Theoretical Backgroundmentioning
confidence: 99%
“…The low volume of the potential target/important customer data (i.e., imbalanced data distribution) is a major challenge in extracting the latent knowledge in banks marketing data [1,3,10]. There is still an insisting need for handling the imbalanced dataset distribution reliably [15][16][17]; commonly used approaches [1,15,16,[18][19][20][21] impose processing overhead or lead to loss of information.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We used two established approaches, namely the Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbors (ENN), to balance the IoT-23, LITNET-2020, and NetML-2020 datasets [ 22 , 23 ]. Recently, hybrid approaches have become popular [ 24 ]. Methods like SMOTE+ENN, among other, have often been utilized for alleviating the issue of class imbalance to boost the efficiency of the classifier.…”
Section: Proposed Methodologymentioning
confidence: 99%
“…The count of samples in each class in the pre-processed dataset is subjected to the balancing procedure. Following [ 24 ], the approach for class balancing is presented in Algorithm 1. Algorithm 1 SMOTE+ENN.
…”
Section: Proposed Methodologymentioning
confidence: 99%