2022
DOI: 10.1101/2022.07.18.500262
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data

Abstract: Machine learning (ML) is becoming a standard tool in neuroscience and neuroimaging research. Yet, because it is such a powerful tool, the appropriate application of ML requires a sound understanding of its subtleties and limitations. In particular, applying ML to datasets with imbalanced classes, which are very common in neuroscience, can have severe consequences if not adequately addressed. With the neuroscience machine-learning user in mind, this technical note provides a didactic overview of the class imbal… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 79 publications
0
2
0
Order By: Relevance
“…Accuracy is well-accepted measurement for evaluating the performance of a classification problem. However, for an imbalanced dataset, the use of accuracy as an effective indicator has been questioned recently by various authors [49][50][51][52][53] . Therefore, alternative evaluation metrics for assessing the effectiveness of ML models for imbalanced dataset were explored, as accuracy alone is not trustworthy.…”
Section: Resultsmentioning
confidence: 99%
“…Accuracy is well-accepted measurement for evaluating the performance of a classification problem. However, for an imbalanced dataset, the use of accuracy as an effective indicator has been questioned recently by various authors [49][50][51][52][53] . Therefore, alternative evaluation metrics for assessing the effectiveness of ML models for imbalanced dataset were explored, as accuracy alone is not trustworthy.…”
Section: Resultsmentioning
confidence: 99%
“…To address the imbalance in the dataset and minimise error, we utilised an over-sampling approach to balance the sample (91). This may introduce noise into the synthetic sample in the dataset, resulting in some level of bias remaining in the models (92). In this study, we also used an integrative approach and included multiple features in a machine learning model with the 5-fold cross-validation techniques which enabled us to evaluate the performance of the model on multiple subsets of the training set.…”
Section: Discussionmentioning
confidence: 99%