2018
DOI: 10.1021/acs.jcim.8b00348
|View full text |Cite
|
Sign up to set email alerts
|

Dissecting Machine-Learning Prediction of Molecular Activity: Is an Applicability Domain Needed for Quantitative Structure–Activity Relationship Models Based on Deep Neural Networks?

Abstract: Deep neural networks (DNNs) are the major drivers of recent progress in artificial intelligence. They have emerged as the machine-learning method of choice in solving image and speech recognition problems, and their potential has raised the expectation of similar breakthroughs in other fields of study. In this work, we compared three machine-learning methods DNN, random forest (a popular conventional method), and variable nearest neighbor (arguably the simplest method)in their ability to predict the molecula… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
34
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 42 publications
(35 citation statements)
references
References 29 publications
(56 reference statements)
1
34
0
Order By: Relevance
“…Feature space distance metrics 5,61,70,72,[110][111][112] have previously been motivated as a potential uncertainty measure, but because we did not carry out feature selection 60 in this work, the utility of feature space distances for estimating prediction uncertainty is limited (Supporting Information Figure S7). We thus introduce an uncertainty measure that depends directly on the data distribution in the ANN latent space 113 , i.e., the space spanned by the last layer of neurons before the output layer.…”
Section: Figurementioning
confidence: 99%
“…Feature space distance metrics 5,61,70,72,[110][111][112] have previously been motivated as a potential uncertainty measure, but because we did not carry out feature selection 60 in this work, the utility of feature space distances for estimating prediction uncertainty is limited (Supporting Information Figure S7). We thus introduce an uncertainty measure that depends directly on the data distribution in the ANN latent space 113 , i.e., the space spanned by the last layer of neurons before the output layer.…”
Section: Figurementioning
confidence: 99%
“…Most of the deep learning models in drug discovery currently do not consider applicability domain restrictions 166,167 , that is, the region of chemical space where statistical learning assumptions are met. These restrictions should, in the authors' opinion, be considered an integral element of XAI, as their assessment and a rigorous evaluation of model accuracy has proven to be more relevant for decision-making than the modelling approach itself 168 . Knowing when to apply which particular model will probably help address the problem of high confidence of deep learning models on wrong predictions 121 and avoid unnecessary extrapolations at the same time.…”
Section: Discussionmentioning
confidence: 99%
“…In addition to the question of how an applicability domain can be successfully implemented, it is also interesting to see which models need a defined applicability domain. Liu and coworkers showed that, while it is generally accepted that traditional machine learning models need an applicability domain, deep learning models also cannot generalize much better . However, this is not very surprising given that until now most deep learning models are not built on much larger datasets than traditional models.…”
Section: Machine Learning Based Predictionsmentioning
confidence: 99%