2022
DOI: 10.3390/app12083928
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Assessment of Performance of Data Balancing Techniques in Classification Task

Abstract: Many real-world classification problems such as fraud detection, intrusion detection, churn prediction, and anomaly detection suffer from the problem of imbalanced datasets. Therefore, in all such classification tasks, we need to balance the imbalanced datasets before building classifiers for prediction purposes. Several data-balancing techniques (DBT) have been discussed in the literature to address this issue. However, not much work is conducted to assess the performance of DBT. Therefore, in this research p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 40 publications
0
7
0
Order By: Relevance
“…Unbalanced datasets are relevant and commonly observed in pathology detection problems that can significantly impact the classification performance of machine learning models. Several solutions have been proposed to deal with unbalanced datasets ( 44 , 45 ) and the problem was solved by data resampling at the pre-processing data level. The basic idea of unbalance is to resample the original dataset, either by oversampling the smallest class or subsampling the largest class until the class sizes are approximately the same.…”
Section: Data Balancing Techniquesmentioning
confidence: 99%
See 1 more Smart Citation
“…Unbalanced datasets are relevant and commonly observed in pathology detection problems that can significantly impact the classification performance of machine learning models. Several solutions have been proposed to deal with unbalanced datasets ( 44 , 45 ) and the problem was solved by data resampling at the pre-processing data level. The basic idea of unbalance is to resample the original dataset, either by oversampling the smallest class or subsampling the largest class until the class sizes are approximately the same.…”
Section: Data Balancing Techniquesmentioning
confidence: 99%
“…The weakness of this method is that if the dataset is large, it can introduce a significant additional computational load and the duplication of information due to the oversampling of the minority class instances, which can lead to the overfitting of the model. However, this method retains all important information, unlike the US method ( 44 ).…”
Section: Data Balancing Techniquesmentioning
confidence: 99%
“…Data balancing [53,54] is crucial to addressing class imbalance and making sure that machine learning models are impartial, legitimate, and powerful. It increases the performance of the model, averts bias, enhances generalizability, facilitates better learning of features, prevents overfitting, and increases the model's stability to change in concept.…”
Section: Data Balancing For Classificationmentioning
confidence: 99%
“…There are 450176 urls in the dataset. Imbalanced dataset affects the classification process which gives a skewed result [13]. To avoid such issue, the experiment uses 10000 benign and 10000 malicious urls.…”
Section: A Raw Datasetmentioning
confidence: 99%