Prediction Is a Balancing Act: Importance of Sampling Methods to Balance Sensitivity and Specificity of Predictive Models Based on Imbalanced Chemical Data Sets

Banerjee, Priyanka; Dehnbostel, Frederic O.; Preißner, Robert

doi:10.3389/fchem.2018.00362

Cited by 126 publications

(98 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Imbalanced data are known to introduce a high degree of classification bias to model performance (eg, sensitivity, specificity) such that the machine learning algorithms are almost never be able to predict the minority class and that the majority class almost always has inflated model performance. Thus, when using highly imbalanced data set, we often see a large gap between sensitivity and specificity of machine learning models and a high misclassification rate for the minority class . In order to solve such issues, various strategies such as oversampling or undersampling have been proposed to reduce the inherent bias resulting from imbalanced data.…”

Section: Methodsmentioning

confidence: 99%

“…In order to solve such issues, various strategies such as oversampling or undersampling have been proposed to reduce the inherent bias resulting from imbalanced data. Oversampling has demonstrated to be able to reduce the gap between sensitivity and specificity and lower the misclassification rate for the minority class . On the other hand, if not done properly, oversampling can result in overfitting issues such as obtaining perfect accuracy and AUC when in reality they are not perfect.…”

Section: Methodsmentioning

confidence: 99%

“…Thus, when using highly imbalanced data set, we often see a large gap between sensitivity and specificity of machine learning models and a high misclassification rate for the minority class. 30,31 In order to solve such issues, various strategies such as oversampling or undersampling have been proposed to reduce the inherent bias resulting from imbalanced data.…”

Section: Analytical Approachmentioning

confidence: 99%

“…Oversampling has demonstrated to be able to reduce the gap between sensitivity and specificity and lower the misclassification rate for the minority class. 30,31 On the other hand, if not done properly, oversampling can result in overfitting issues such as obtaining perfect accuracy and AUC when in reality they are not perfect. In this study, we strived to minimise overfitting issues by using a separate validation sample for model validation.…”

Section: Analytical Approachmentioning

confidence: 99%

See 3 more Smart Citations

Application of machine learning for diagnostic prediction of root caries

et al. 2019

View full text Add to dashboard Cite

Objective This study sought to utilise machine learning methods in artificial intelligence to select the most relevant variables in classifying the presence and absence of root caries and to evaluate the model performance. Background Dental caries is one of the most prevalent oral health problems. Artificial intelligence can be used to develop models for identification of root caries risk and to gain valuable insights, but it has not been applied in dentistry. Accurately identifying root caries may guide treatment decisions, leading to better oral health outcomes. Methods Data were obtained from the 2015‐2016 National Health and Nutrition Examination Survey and were randomly divided into training and test sets. Several supervised machine learning methods were applied to construct a tool that was capable of classifying variables into the presence and absence of root caries. Accuracy, sensitivity, specificity and area under the receiver operating curve were computed. Results Of the machine learning algorithms developed, support vector machine demonstrated the best performance with an accuracy of 97.1%, precision of 95.1%, sensitivity of 99.6% and specificity of 94.3% for identifying root caries. The area under the curve was 0.997. Age was the feature most strongly associated with root caries. Conclusion The machine learning algorithms developed in this study perform well and allow for clinical implementation and utilisation by dental and nondental professionals. Clinicians are encouraged to adopt the algorithms from this study for early intervention and treatment of root caries for the ageing population of the United States, and for attaining precision dental medicine.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Analytical Approachmentioning

confidence: 99%

Section: Analytical Approachmentioning

confidence: 99%

See 2 more Smart Citations

Application of machine learning for diagnostic prediction of root caries

et al. 2019

View full text Add to dashboard Cite

show abstract

“…The kappa value is 0.69 for the model (Banerjee et al 2018b). The model applied to predict hepatotoxicity is described in detail in the literature (Banerjee et al 2018a).…”

Section: Prediction Of Toxicity and Drug-likenessmentioning

confidence: 99%

Synthesis and preliminary hepatotoxicity evaluation of new caffeine-8-(2-thio)-propanoic hydrazid-hydrazone derivatives

Mitkov¹,

Kondeva-Burdina²,

Zlatkov³

2019

PHAR

View full text Add to dashboard Cite

New series of caffeine-8-(2-thio)-propanoic hydrazid-hydrazone derivatives were designed and synthesized. The targed compounds were obtained in yields of 51 to 96% and their structures were elucidated by FTIR, 1H NMR, 13C NMR, MS and microanalyses. All of the compounds were found to be “drug-like” as they fulfill the criteria of drug-likeness, which includes the MDDR-like rule. The tested compounds were subjected to in silico prediction of substrate/metabolite specificity and Drug Induced Liver Injury (DILI). The prediction for indicated that the evaluated compounds would most probably act as CYP1A2 substrates. The performed in vitro studies didn’t reveal statistically significant hepatotoxicity of the tested compounds, probably due to the pro-oxidant effects expressed on sub-cellular (isolated rat liver microsomes) level. The obtained experimental results confirmed the predicted low hepatotoxicity for the tested structures. Based on these results the compounds may be considered as promising structures for design of future molecules with low hepatotoxicity.

show abstract

In SilicoPlatforms for Predictive Ecotoxicology

Lee

Sung

2021

Chemometrics and Cheminformatics in Aquatic Toxicology

View full text Add to dashboard Cite

Prediction Is a Balancing Act: Importance of Sampling Methods to Balance Sensitivity and Specificity of Predictive Models Based on Imbalanced Chemical Data Sets

Cited by 126 publications

References 33 publications

Application of machine learning for diagnostic prediction of root caries

Application of machine learning for diagnostic prediction of root caries

Synthesis and preliminary hepatotoxicity evaluation of new caffeine-8-(2-thio)-propanoic hydrazid-hydrazone derivatives

In SilicoPlatforms for Predictive Ecotoxicology

Contact Info

Product

Resources

About