2018
DOI: 10.1021/acs.jcim.8b00114
|View full text |Cite
|
Sign up to set email alerts
|

General Approach to Estimate Error Bars for Quantitative Structure–Activity Relationship Predictions of Molecular Activity

Abstract: Key requirements for quantitative structure-activity relationship (QSAR) models to gain acceptance by regulatory authorities include a defined domain of applicability (DA) and appropriate measures of goodness-of-fit, robustness, and predictivity. Hence, many DA metrics have been developed over the past two decades. The most intuitive are perhaps distance-to-model metrics, which are most commonly defined in terms of the mean distance between a molecule and its k nearest training samples. Detailed evaluations ha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
46
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(48 citation statements)
references
References 41 publications
0
46
0
Order By: Relevance
“…To demonstrate the advantages of the latent space distance metric in a quantitative fashion, we compare to three established uncertainty metrics. This assessment is particularly motivated by the nature of chemical discovery applications,8 where data set sizes are often smaller and have more broadly varying chemistry than typical applications in neural network potentials19,40 or in quantitative structure–property relationships in cheminformatics 41,52. To mimic chemical discovery efforts, we train neural networks to predict transition metal complex spin state energetics7 and test them on diverse transition metal complexes from experimental databases.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…To demonstrate the advantages of the latent space distance metric in a quantitative fashion, we compare to three established uncertainty metrics. This assessment is particularly motivated by the nature of chemical discovery applications,8 where data set sizes are often smaller and have more broadly varying chemistry than typical applications in neural network potentials19,40 or in quantitative structure–property relationships in cheminformatics 41,52. To mimic chemical discovery efforts, we train neural networks to predict transition metal complex spin state energetics7 and test them on diverse transition metal complexes from experimental databases.…”
Section: Resultsmentioning
confidence: 99%
“…A final class of widely applied uncertainty metrics employs distances in feature space of the test molecule to available training data to provide an estimate of molecular similarity and thus model applicability. The advantages of feature space distances are that they are easily interpreted, may be rapidly computed, and are readily applied regardless of the regression model7,8,41,52 (Fig. 1).…”
Section: Introductionmentioning
confidence: 99%
“…Feature space distance metrics 5,61,70,72,[110][111][112] have previously been motivated as a potential uncertainty measure, but because we did not carry out feature selection 60 in this work, the utility of feature space distances for estimating prediction uncertainty is limited (Supporting Information Figure S7). We thus introduce an uncertainty measure that depends directly on the data distribution in the ANN latent space 113 , i.e., the space spanned by the last layer of neurons before the output layer.…”
Section: Figurementioning
confidence: 99%
“…To demonstrate the advantages of the latent space distance metric in a quantitative fashion, we compare to three established uncertainty metrics. This assessment is particularly motivated by the nature of chemical discovery applications 8,25 , where data set sizes are often smaller and have more broadly varying chemistry than typical applications in neural network potentials 19,41 and in quantitative structure-property relationships in cheminformatics 42,52 . To mimic chemical discovery efforts, we train neural networks to predict transition metal complex spin state energetics 7 and test them on diverse transition metal complexes from experimental databases.…”
Section: Resultsmentioning
confidence: 99%
“…A final class of widely applied uncertainty metrics employs distances in feature space of the test molecule to available training data to provide an estimate of molecular similarity and thus model applicability. The advantages of feature space distances are that they are easily interpreted, may be rapidly computed, and are readily applied regardless of the regression model 7,25,42,52 (Figure 1). We used [7][8]25 high feature space distances to successfully reduce model prediction errors on retained points while still discovering new transition metal complexes.…”
Section: Introductionmentioning
confidence: 99%