2021
DOI: 10.32604/cmc.2021.012547
|View full text |Cite
|
Sign up to set email alerts
|

Dealing with Imbalanced Dataset Leveraging Boundary Samples Discovered by Support Vector Data Description

Abstract: These days, imbalanced datasets, denoted throughout the paper by ID, (a dataset that contains some (usually two) classes where one contains considerably smaller number of samples than the other(s)) emerge in many real world problems (like health care systems or disease diagnosis systems, anomaly detection, fraud detection, stream based malware detection systems, and so on) and these datasets cause some problems (like under-training of minority class(es) and over-training of majority class(es), bias towards maj… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 63 publications
(67 reference statements)
0
4
0
Order By: Relevance
“…When the dataset is unbalanced and the MSE function is used as the loss function, it makes the machine learning model more inclined to predict classes with large sample numbers [41]. Data sampling is commonly used in many studies to address data imbalance, but such methods can have an unexpected impact on the data: undersampling can result in missing data, oversampling is blind in generating the data, and data sampling, in general, can easily marginalize data [42][43][44]. In this study, we attempted to use a focal loss (FL) [45] to address the data imbalance problem.…”
Section: Applying the Cma-es To Train The Dnmmentioning
confidence: 99%
“…When the dataset is unbalanced and the MSE function is used as the loss function, it makes the machine learning model more inclined to predict classes with large sample numbers [41]. Data sampling is commonly used in many studies to address data imbalance, but such methods can have an unexpected impact on the data: undersampling can result in missing data, oversampling is blind in generating the data, and data sampling, in general, can easily marginalize data [42][43][44]. In this study, we attempted to use a focal loss (FL) [45] to address the data imbalance problem.…”
Section: Applying the Cma-es To Train The Dnmmentioning
confidence: 99%
“…Unlike under-sampling method, oversampling method balances the class distribution by replicating minority class samples until it can reach the number of majority class samples. For example, these oversampling methods in Zhu et al (2021), Luo et al (2021), Amit and Chinmay (2021) and Li et al (2021) achieve ideal classification results, but they can only learn on specific decision regions of the replicated data, which limits the learning ability of a classifier.…”
Section: Related Workmentioning
confidence: 99%
“…C-SMOTE [26], an over-sampling method based on clustering, clusters positive and negative classes separately which cannot only solve the problem of imbalance between classes, but also solve the problem of imbalance within classes. Re-sampling with easily misjudged boundary samples found by Support Vector Data Description (SVDD) [27]. Such as similar resampling methods [28,29].…”
Section: Unbalanced Data Processingmentioning
confidence: 99%