2021
DOI: 10.1016/j.artmed.2020.101987
|View full text |Cite
|
Sign up to set email alerts
|

Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
67
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 54 publications
(71 citation statements)
references
References 35 publications
4
67
0
Order By: Relevance
“…To computationally deal with this imbalance, we used the SMOTE technique [ 55 ] to oversample the minority class in the training sets after partitioning the data during the learning process. It is essential to oversample after data partitioning to keep the test data representative of the original distribution of the dataset and avoid information leakage that can lead to overly optimistic prediction results [ 56 ].…”
Section: Methodsmentioning
confidence: 99%
“…To computationally deal with this imbalance, we used the SMOTE technique [ 55 ] to oversample the minority class in the training sets after partitioning the data during the learning process. It is essential to oversample after data partitioning to keep the test data representative of the original distribution of the dataset and avoid information leakage that can lead to overly optimistic prediction results [ 56 ].…”
Section: Methodsmentioning
confidence: 99%
“…It is essential to oversample after data partitioning to keep the test data representative of the original distribution of the dataset and avoid information leakage that can lead to overly optimistic prediction results. 38…”
Section: Methodsmentioning
confidence: 99%
“…It is essential to oversample after data partitioning to keep the test data representative of the original distribution of the dataset and avoid information leakage that can lead to overly optimistic prediction results. 38 We investigated different supervised classification algorithms on the selected features and evaluated the results. Specifically, we applied six algorithms: logistic regression (LR), random forest (RF), support vector machines (SVM), XGBoost (XGB), k-nearest neighbors (KNN), and deep neural network (DNN).…”
Section: Machine Learning Approachmentioning
confidence: 99%
“…Apart from the above augmentation approaches, we also look into a specific oversampling technique [38] to compare the methodological flaw if there any exists in Polar unrolling as key part of the augmentation step. The oversampling of the data can be an alternative step to augmentation due to given imbalanced data.…”
Section: Data Augmentationmentioning
confidence: 99%