David A. Winkler scite author profile

The statistical metrics used to characterize the external predictivity of a model, i.e., how well it predicts the properties of an independent test set, have proliferated over the past decade. This paper clarifies some apparent confusion over the use of the coefficient of determination, R2, as a measure of model fit and predictive power in QSAR and QSPR modelling. R2 (or R2) has been used in various contexts in the literature in conjunction with training and test data, for both ordinary linear regression and regression through the origin as well as with linear and nonlinear regression models. We analyze the widely adopted model fit criteria suggested by Golbraikh and Tropsha1 in a strict statistical manner. Shortcomings in these criteria are identified and a clearer and simpler alternative method to characterize model predictivity is provided. The intent is not to repeat the well-documented arguments for model validation using test data, but to guide the application of R2 as a model fit statistic. Examples are used to illustrate both correct and incorrect use of R2. Reporting the root mean squared error or equivalent measures of dispersion, typically of more practical importance than R2, is also encouraged and important challenges in addressing the needs of different categories of users such as computational chemists, experimental scientists, and regulatory decision support specialists are outlined.

show abstract

Quantitative Structure–Property Relationship Modeling of Diverse Materials Properties

Epa

et al. 2012

View full text Add to dashboard Cite

show abstract

Bayesian Regularization of Neural Networks

Burden¹,

Winkler

2008

414

313

View full text Add to dashboard Cite

Bayesian regularized artificial neural networks (BRANNs) are more robust than standard back-propagation nets and can reduce or eliminate the need for lengthy cross-validation. Bayesian regularization is a mathematical process that converts a nonlinear regression into a "well-posed" statistical problem in the manner of a ridge regression. The advantage of BRANNs is that the models are robust and the validation process, which scales as O(N2) in normal regression methods, such as back propagation, is unnecessary. These networks provide solutions to a number of problems that arise in QSAR modeling, such as choice of model, robustness of model, choice of validation set, size of validation effort, and optimization of network architecture. They are difficult to overtrain, since evidence procedures provide an objective Bayesian criterion for stopping training. They are also difficult to overfit, because the BRANN calculates and trains on a number of effective network parameters or weights, effectively turning off those that are not relevant. This effective number is usually considerably smaller than the number of weights in a standard fully connected back-propagation neural net. Automatic relevance determination (ARD) of the input variables can be used with BRANNs, and this allows the network to "estimate" the importance of each input. The ARD method ensures that irrelevant or highly correlated indices used in the modeling are neglected as well as showing which are the most important variables for modeling the activity data. This chapter outlines the equations that define the BRANN method plus a flowchart for producing a BRANN-QSAR model. Some results of the use of BRANNs on a number of data sets are illustrated and compared with other linear and nonlinear models.

show abstract

Robust QSAR Models Using Bayesian Regularized Neural Networks

1999

View full text Add to dashboard Cite

We describe the use of Bayesian regularized artificial neural networks (BRANNs) in the development of QSAR models. These networks have the potential to solve a number of problems which arise in QSAR modeling such as: choice of model; robustness of model; choice of validation set; size of validation effort; and optimization of network architecture. The application of the methods to QSAR of compounds active at the benzodiazepine and muscarinic receptors is illustrated.

show abstract

Materials Genome in Action: Identifying the Performance Limits of Physical Hydrogen Storage

et al. 2017

View full text Add to dashboard Cite

The Materials Genome is in action: the molecular codes for millions of materials have been sequenced, predictive models have been developed, and now the challenge of hydrogen storage is targeted. Renewably generated hydrogen is an attractive transportation fuel with zero carbon emissions, but its storage remains a significant challenge. Nanoporous adsorbents have shown promising physical adsorption of hydrogen approaching targeted capacities, but the scope of studies has remained limited. Here the Nanoporous Materials Genome, containing over 850 000 materials, is analyzed with a variety of computational tools to explore the limits of hydrogen storage. Optimal features that maximize net capacity at room temperature include pore sizes of around 6 Å and void fractions of 0.1, while at cryogenic temperatures pore sizes of 10 Å and void fractions of 0.5 are optimal. Our top candidates are found to be commercially attractive as “cryo-adsorbents”, with promising storage capacities at 77 K and 100 bar with 30% enhancement to 40 g/L, a promising alternative to liquefaction at 20 K and compression at 700 bar.

show abstract

A renaissance of neural networks in drug discovery

Baskin

Winkler

Tetko

2016

Expert Opinion on Drug Discovery

204

158

View full text Add to dashboard Cite

Neural networks continue to grow in importance for drug discovery. Recent developments in deep learning suggests further improvements may be gained in the analysis of large chemical data sets. It's anticipated that neural networks will be more widely used in drug discovery in the future, and applied in non-traditional areas such as drug delivery systems, biologically compatible materials, and regenerative medicine.

show abstract

Unzipping the role of chirality in nanoscale self-assembly of tripeptide hydrogels

et al. 2012

View full text Add to dashboard Cite

Change of chirality is a useful tool to manipulate the aqueous self-assembly behaviour of uncapped, hydrophobic tripeptides. In contrast with other short peptides, these tripeptides form hydrogels at a physiological pH without the aid of organic solvents or end-capping groups (e.g. Fmoc). The novel hydrogel forming peptide (D)Leu-Phe-Phe ((D)LFF) and its epimer Leu-Phe-Phe (LFF) exemplify dramatic supramolecular effects induced by subtle changes to stereochemistry. Only the d-amino acid-containing peptide instantly forms a hydrogel in aqueous solution following a pH switch, generating long fibres (>100 μm) that entangle into a 3D network. However, unexpected nanostructures are observed for both peptides and they are particularly heterogeneous for LFF. Structural analyses using CD, FT-IR and fluorescent amyloid staining reveal anti-parallel beta-sheets for both peptides. XRD analysis also identifies key distances consistent with beta-sheet formation in both peptides, but suggests additional high molecular order and extended molecular length for (D)LFF only. Molecular modelling of the two peptides highlights the key interactions responsible for self-assembly; in particular, rapid self-assembly of (D)LFF is promoted by a phenylalanine zipper, which is not possible because of steric factors for LFF. In conclusion, this study elucidates for the first time the molecular basis for how chirality can dramatically influence supramolecular organisation in very short peptide sequences.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

David A. Winkler

QSAR without borders

Beware of R²: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models

Quantitative Structure–Property Relationship Modeling of Diverse Materials Properties

Bayesian Regularization of Neural Networks

Robust QSAR Models Using Bayesian Regularized Neural Networks

Materials Genome in Action: Identifying the Performance Limits of Physical Hydrogen Storage

A renaissance of neural networks in drug discovery

Unzipping the role of chirality in nanoscale self-assembly of tripeptide hydrogels

Contact Info

Product

Resources

About

David A. Winkler

QSAR without borders

Beware of R2: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models

Quantitative Structure–Property Relationship Modeling of Diverse Materials Properties

Bayesian Regularization of Neural Networks

Robust QSAR Models Using Bayesian Regularized Neural Networks

Materials Genome in Action: Identifying the Performance Limits of Physical Hydrogen Storage

A renaissance of neural networks in drug discovery

Unzipping the role of chirality in nanoscale self-assembly of tripeptide hydrogels

Contact Info

Product

Resources

About

Beware of R²: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models