2021
DOI: 10.48550/arxiv.2109.00201
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An Empirical Study on the Joint Impact of Feature Selection and Data Re-sampling on Imbalance Classification

Abstract: In predictive tasks, real-world datasets often present different degrees of imbalanced (i.e., long-tailed or skewed) distributions. While the majority (the head or the most frequent) classes have sufficient samples, the minority (the tail or the less frequent or rare) classes can be under-represented by a rather limited number of samples. Data pre-processing has been shown to be very effective in dealing with such problems. On one hand, data re-sampling is a common approach to tackling class imbalance. On the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 47 publications
0
2
0
Order By: Relevance
“…The level of peakedness or non-Gaussian behavior in the frequency domain. This value was calculated for the frequency bands [0,1.5] Hz and [1,4] Hz…”
Section: Spectral Kurtosismentioning
confidence: 99%
See 1 more Smart Citation
“…The level of peakedness or non-Gaussian behavior in the frequency domain. This value was calculated for the frequency bands [0,1.5] Hz and [1,4] Hz…”
Section: Spectral Kurtosismentioning
confidence: 99%
“…The problem of class imbalance arises when some classes (or categories) have significantly smaller number of samples compared to others, leading to a model that is less likely to detect those minority classes due to the insufficient number of samples in the training set needed for proper learning. This problem presents itself in various domains and applications including but not limited to security, finance, environment, agriculture, and health (1)(2)(3)(4). Typically, class imbalance is mitigated either at the model level by adapting and adjusting the training procedure based on the different data samples and training progression, or at the data level by modifying the class distributions in such a way as to allow for improved class separability, typically via resampling (5)(6)(7).…”
Section: Introductionmentioning
confidence: 99%