Interpretation of Quantitative Structure–Activity Relationship Models: Past, Present, and Future

Polishchuk, Pavel

doi:10.1021/acs.jcim.7b00274

Cited by 179 publications

(146 citation statements)

References 121 publications

(245 reference statements)

Supporting

Mentioning

139

Contrasting

Unclassified

Order By: Relevance

“…Random forest (RF) [36] is a supervised learning algorithm with an ensemble of decision trees generated from a bootstrapped (bagged) sampling of compounds and features. It is widely used in the traditional structure-property relation research [37], and was considered as a "gold standard" according to its robustness, easy usage and high prediction accuracy in structure-property relationship research [38]. Here, the ECFP with a fixed length of 1024 [12] was used with the RF model, which was implemented in Python 3.6.3 [39] with the package Scikit-learn, version 0.21.2 [40].…”

Section: Random Forestmentioning

confidence: 99%

A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility

et al. 2020

View full text Add to dashboard Cite

Efficient and accurate prediction of molecular properties, such as lipophilicity and solubility, is highly desirable for rational compound design in chemical and pharmaceutical industries. To this end, we build and apply a graph-neuralnetwork framework called self-attention-based message-passing neural network (SAMPN) to study the relationship between chemical properties and structures in an interpretable way. The main advantages of SAMPN are that it directly uses chemical graphs and breaks the black-box mold of many machine/deep learning methods. Specifically, its attention mechanism indicates the degree to which each atom of the molecule contributes to the property of interest, and these results are easily visualized. Further, SAMPN outperforms random forests and the deep learning framework MPN from Deepchem. In addition, another formulation of SAMPN (Multi-SAMPN) can simultaneously predict multiple chemical properties with higher accuracy and efficiency than other models that predict one specific chemical property. Moreover, SAMPN can generate chemically visible and interpretable results, which can help researchers discover new pharmaceuticals and materials. The source code of the SAMPN prediction pipeline is freely available at Github (https ://githu b.com/tbwxm u/SAMPN ).

show abstract

Section: Random Forestmentioning

confidence: 99%

A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Most of them share a common drawback: failure to interpret the underlying causal relationships between the inputs and the response treating the ANN models essentially as a black box . Coefficient of correlation/determination ( R / R 2 ) values calculated for a linear fit between predicted and measured values are often erroneously (as noted by Héberger) used as performance metrics invalidating the models.…”

Section: Introductionmentioning

confidence: 99%

Interpretation of ANN‐based QSAR models for prediction of antioxidant activity of flavonoids

2018

View full text Add to dashboard Cite

Quantitative structure-activity relationships (QSARs) built using machine learning methods, such as artificial neural networks (ANNs) are powerful in prediction of (antioxidant) activity from quantum mechanical (QM) parameters describing the molecular structure, but are usually not interpretable. This obvious difficulty is one of the most common obstacles in application of ANN-based QSAR models for design of potent antioxidants or elucidating the underlying mechanism. Interpreting the resulting models is often omitted or performed erroneously altogether. In this work, a comprehensive comparative study of six methods (PaD, PaD , weights, stepwise, perturbation and profile) for exploration and interpretation of ANN models built for prediction of Trolox-equivalent antioxidant capacity (TEAC) QM descriptors, is presented. Sum of ranking differences (SRD) was used for ranking of the six methods with respect to the contributions of the calculated QM molecular descriptors toward TEAC. The results show that the PaD, PaD and profile methods are the most stable and give rise to realistic interpretation of the observed correlations. Therefore, they are safely applicable for future interpretations without the opinion of an experienced chemist or bio-analyst. © 2018 Wiley Periodicals, Inc.

show abstract

“…An emerging technology is explainable AI which tries to open the black box. There are many already existing possibilities to explain model prediction, as shown by Polishchuk . Newer technologies are emerging, especially with regard to neural networks.…”

Section: Where Are We Headed?mentioning

confidence: 99%

In silico toxicology: From structure–activity relationships towards deep learning and adverse outcome pathways

Hemmerich

Ecker

2020

WIREs Comput Mol Sci

View full text Add to dashboard Cite

In silico toxicology is an emerging field. It gains increasing importance as research is aiming to decrease the use of animal experiments as suggested in the 3R principles by Russell and Burch. In silico toxicology is a means to identify hazards of compounds before synthesis, and thus in very early stages of drug development. For chemical industries, as well as regulatory agencies it can aid in gap‐filling and guide risk minimization strategies. Techniques such as structural alerts, read‐across, quantitative structure–activity relationship, machine learning, and deep learning allow to use in silico toxicology in many cases, some even when data is scarce. Especially the concept of adverse outcome pathways puts all techniques into a broader context and can elucidate predictions by mechanistic insights. This article is categorized under: Structure and Mechanism > Computational Biochemistry and Biophysics Data Science > Chemoinformatics

show abstract

Interpretation of Quantitative Structure–Activity Relationship Models: Past, Present, and Future

Cited by 179 publications

References 121 publications

A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility

A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility

Interpretation of ANN‐based QSAR models for prediction of antioxidant activity of flavonoids

In silico toxicology: From structure–activity relationships towards deep learning and adverse outcome pathways

Contact Info

Product

Resources

About