Cross-validation strategies in QSPR modelling of chemical reactions

Rakhimbekova, Assima; Akhmetshin, Tagir; Minibaeva, G.I.; Nugmanov, Ramil; Gimadiev, Timur; Madzhidov, Timur; Baskin, Igor I.; Varnek, Alexandre

doi:10.1080/1062936x.2021.1883107

Cited by 15 publications

(13 citation statements)

References 41 publications

(64 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For this reason, testing “extrapolative” splits has become popular in these yield prediction tasks to gauge the value of different molecular or reaction representations. 158,159 An important caveat of these studies is that data from HTE is qualitatively different from data that is typically published. In particular, a single paper might include only a dozen substrates; combining datasets from multiple papers describing the same reaction type will lead to confounding variables like the precise choice of conditions.…”

Section: Reaction Development Goalsmentioning

confidence: 99%

Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery

Stuyver

Coley

2023

Chem. Sci.

View full text Add to dashboard Cite

show abstract

Section: Reaction Development Goalsmentioning

confidence: 99%

Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery

Stuyver

Coley

2023

Chem. Sci.

View full text Add to dashboard Cite

show abstract

“…Rigorously splitting a dataset into training, validation and test sets is a crucial task that can be overlooked easily, and may lead to drastically wrong reported performances. 74,75 In the following, we showcase this pitfall by training a model of the QM9 target internal energy at temperatures T equal 0 K and 298 K. We treat the temperature as an input (in addition to the molecular graph), and train on the single property U (T ). The temperature is appended to the aggregated molecular embedding (after the message-passing neural network, before the feed-forward neural network).…”

Section: Test Set Contaminationmentioning

confidence: 99%

Characterizing Uncertainty in Machine Learning for Chemistry

Heid

McGill

Vermeire

et al. 2023

Preprint

View full text Add to dashboard Cite

Characterizing uncertainty in machine learning models has recently gained interest in the context of machine learning reliability, robustness, safety, and active learning. Here, we separate the total uncertainty into contributions from noise in the data (aleatoric) and shortcomings of the model (epistemic), further dividing epistemic uncertainty into model bias and variance contributions. We systematically address the influence of noise, model bias, and model variance in the context of chemical property predictions, where the diverse nature of target properties and the vast chemical chemical space give rise to many different distinct sources of prediction error. We demonstrate that different sources of error can each be significant in different contexts and must be individually addressed during model development. Through controlled experiments on datasets of molecular properties, we show important trends in model performance associated with the level of noise in the dataset, size of the dataset, model architecture, molecule representation, ensemble size, and dataset splitting. In particular, we show that 1) noise in the test set can limit a model's observed performance when the actual performance is much better, 2) using size-extensive model aggregation structures is crucial for extensive property prediction, 3) ensembling is a reliable tool for uncertainty quantification and improvement specifically for the contribution of model variance, and 4) evaluations of cross-validation models understate their performance. We develop general guidelines on how to improve an underperforming model when falling into different uncertainty contexts.

show abstract

“…A plasma protein binding module was built using the graph CNN approach. ADME and toxicity predictions also guide the required changes in the existing lead to develop a potential candidate drug (Rakhimbekova et al, 2021; Tuntland et al, 2014). Knowledge about toxic substructure and chemical entities, offsite recognition, drug metabolites interaction, and drug‐drug interaction can be used to develop a holistic model for toxicity predictions.…”

Section: Approaches In Drug Discoverymentioning

confidence: 99%

Machine learning approaches and their applications in drug discovery and design

et al. 2022

View full text Add to dashboard Cite

This review is focused on several machine learning approaches used in chemoinformatics. Machine learning approaches provide tools and algorithms to improve drug discovery. Many physicochemical properties of drugs like toxicity, absorption, drug‐drug interaction, carcinogenesis, and distribution have been effectively modeled by QSAR techniques. Machine learning is a subset of artificial intelligence, and this technique has shown tremendous potential in the field of drug discovery. Techniques discussed in this review are capable of modeling non‐linear datasets, as well as big data of increasing depth and complexity. Various machine learning‐based approaches are being used for drug target prediction, modeling the structure of drug target, binding site prediction, ligand‐based similarity searching, de novo designing of ligands with desired properties, developing scoring functions for molecular docking, building QSAR model for biological activity prediction, and prediction of pharmacokinetic and pharmacodynamic properties of ligands. In recent years, these predictive tools and models have achieved good accuracy. By the use of more related input data, relevant parameters, and appropriate algorithms, the accuracy of these predictions can be further improved.

show abstract

Cross-validation strategies in QSPR modelling of chemical reactions

Cited by 15 publications

References 41 publications

Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery

Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery

Characterizing Uncertainty in Machine Learning for Chemistry

Machine learning approaches and their applications in drug discovery and design

Contact Info

Product

Resources

About