2020
DOI: 10.1021/acs.jcim.0c00476
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Scaling Methods to Obtain Calibrated Probabilities of Activity for Protein–Ligand Predictions

Abstract: In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into a probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely, Platt scaling (PS), isotonic regression (IR), and Venn–ABERS predictors (VA), in calibrating prediction scores obtained from ligand–target prediction comprising the Naïve Bayes, support vector machines, and random forest (RF) algori… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2

Relationship

2
7

Authors

Journals

citations
Cited by 15 publications
(11 citation statements)
references
References 79 publications
0
11
0
Order By: Relevance
“…224 Despite the advantages of using CNNs, it is important to keep in mind that their performance could be limited by the data availability as often imaging datasets are small and heavily conditional. 225 In fact, other important model characteristics such as the applicability domain (where the model works with high reliability and where it doesn't, for example in terms of areas of new chemical space, e.g., Reliability Density Neighbourhoods 226 ) and prediction uncertainty (Venn-Abers, conformal prediction) 227,228 should also be considered, as well as performance-based measures such as accuracy, AUROC and AUPRC, but are often neglected in bioactivity model evaluations despite providing a measure of how confident one can be in new predictions (which is the ultimate goal of target prediction, and any supervised ML model).…”
Section: Unsupervised Machine Learningmentioning
confidence: 99%
“…224 Despite the advantages of using CNNs, it is important to keep in mind that their performance could be limited by the data availability as often imaging datasets are small and heavily conditional. 225 In fact, other important model characteristics such as the applicability domain (where the model works with high reliability and where it doesn't, for example in terms of areas of new chemical space, e.g., Reliability Density Neighbourhoods 226 ) and prediction uncertainty (Venn-Abers, conformal prediction) 227,228 should also be considered, as well as performance-based measures such as accuracy, AUROC and AUPRC, but are often neglected in bioactivity model evaluations despite providing a measure of how confident one can be in new predictions (which is the ultimate goal of target prediction, and any supervised ML model).…”
Section: Unsupervised Machine Learningmentioning
confidence: 99%
“…Ligand-based target prediction methods rely on the principle of chemical similarity, which assumes that compounds with similar chemical structure should exhibit similar biological effects (Mervin et al, 2018a;Mervin et al, 2018b;Mervin et al, 2020). While this principle generally holds across large datasets, it is not always valid, e.g., due to "Activity Cliffs," where the activity of a compound changes abruptly, despite only minor changes in the chemical structure (Young et al, 2008;Stumpfe et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…Such uncertainty needs to be incorporated into the decision-making process, for instance, using a binning or categorical system to compare CL int values for molecule prioritization. ,, However, experimental variability estimates (aleatoric or statistical uncertainty) , are typically not accessible in the public domain due to the lack of repeated measurements . Aleatoric or statistical uncertainty is the noise inherent to the experiments, which comes from the data generation, has a stochastic nature, and is irreducible. Using internal data, experimental variability can be estimated using blind replicates measured over the years. Statistical uncertainty cannot be changed nor reduced and allows estimating the upper performance limits of in silico models. , Uncertainty can also be estimated for model predictions (epistemic uncertainty) to detect compounds outside the applicability domain.…”
Section: Introductionmentioning
confidence: 99%