2018
DOI: 10.3389/fchem.2018.00362
|View full text |Cite
|
Sign up to set email alerts
|

Prediction Is a Balancing Act: Importance of Sampling Methods to Balance Sensitivity and Specificity of Predictive Models Based on Imbalanced Chemical Data Sets

Abstract: Increase in the number of new chemicals synthesized in past decades has resulted in constant growth in the development and application of computational models for prediction of activity as well as safety profiles of the chemicals. Most of the time, such computational models and its application must deal with imbalanced chemical data. It is indeed a challenge to construct a classifier using imbalanced data set. In this study, we analyzed and validated the importance of different sampling methods over non-sampli… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
96
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 126 publications
(98 citation statements)
references
References 33 publications
1
96
0
1
Order By: Relevance
“…Imbalanced data are known to introduce a high degree of classification bias to model performance (eg, sensitivity, specificity) such that the machine learning algorithms are almost never be able to predict the minority class and that the majority class almost always has inflated model performance. Thus, when using highly imbalanced data set, we often see a large gap between sensitivity and specificity of machine learning models and a high misclassification rate for the minority class . In order to solve such issues, various strategies such as oversampling or undersampling have been proposed to reduce the inherent bias resulting from imbalanced data.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Imbalanced data are known to introduce a high degree of classification bias to model performance (eg, sensitivity, specificity) such that the machine learning algorithms are almost never be able to predict the minority class and that the majority class almost always has inflated model performance. Thus, when using highly imbalanced data set, we often see a large gap between sensitivity and specificity of machine learning models and a high misclassification rate for the minority class . In order to solve such issues, various strategies such as oversampling or undersampling have been proposed to reduce the inherent bias resulting from imbalanced data.…”
Section: Methodsmentioning
confidence: 99%
“…In order to solve such issues, various strategies such as oversampling or undersampling have been proposed to reduce the inherent bias resulting from imbalanced data. Oversampling has demonstrated to be able to reduce the gap between sensitivity and specificity and lower the misclassification rate for the minority class . On the other hand, if not done properly, oversampling can result in overfitting issues such as obtaining perfect accuracy and AUC when in reality they are not perfect.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The kappa value is 0.69 for the model (Banerjee et al 2018b). The model applied to predict hepatotoxicity is described in detail in the literature (Banerjee et al 2018a).…”
Section: Prediction Of Toxicity and Drug-likenessmentioning
confidence: 99%