Word cloud summary of diverse topics associated with QSAR modeling that are discussed in this review.
The statistical metrics used to characterize the external predictivity of a model, i.e., how well it predicts the properties of an independent test set, have proliferated over the past decade. This paper clarifies some apparent confusion over the use of the coefficient of determination, R2, as a measure of model fit and predictive power in QSAR and QSPR modelling. R2 (or R2) has been used in various contexts in the literature in conjunction with training and test data, for both ordinary linear regression and regression through the origin as well as with linear and nonlinear regression models. We analyze the widely adopted model fit criteria suggested by Golbraikh and Tropsha1 in a strict statistical manner. Shortcomings in these criteria are identified and a clearer and simpler alternative method to characterize model predictivity is provided. The intent is not to repeat the well-documented arguments for model validation using test data, but to guide the application of R2 as a model fit statistic. Examples are used to illustrate both correct and incorrect use of R2. Reporting the root mean squared error or equivalent measures of dispersion, typically of more practical importance than R2, is also encouraged and important challenges in addressing the needs of different categories of users such as computational chemists, experimental scientists, and regulatory decision support specialists are outlined.
Possible terms to include :Term (definition) Alternative or related terms 3D QSAR (three-dimensional quantitative structureactivity relationships)Comparative molecular field analysis (CoMFA), Comparative molecular similarity index analysis (COMSIA), molecular field analysis.
Bayesian regularized artificial neural networks (BRANNs) are more robust than standard back-propagation nets and can reduce or eliminate the need for lengthy cross-validation. Bayesian regularization is a mathematical process that converts a nonlinear regression into a "well-posed" statistical problem in the manner of a ridge regression. The advantage of BRANNs is that the models are robust and the validation process, which scales as O(N2) in normal regression methods, such as back propagation, is unnecessary. These networks provide solutions to a number of problems that arise in QSAR modeling, such as choice of model, robustness of model, choice of validation set, size of validation effort, and optimization of network architecture. They are difficult to overtrain, since evidence procedures provide an objective Bayesian criterion for stopping training. They are also difficult to overfit, because the BRANN calculates and trains on a number of effective network parameters or weights, effectively turning off those that are not relevant. This effective number is usually considerably smaller than the number of weights in a standard fully connected back-propagation neural net. Automatic relevance determination (ARD) of the input variables can be used with BRANNs, and this allows the network to "estimate" the importance of each input. The ARD method ensures that irrelevant or highly correlated indices used in the modeling are neglected as well as showing which are the most important variables for modeling the activity data. This chapter outlines the equations that define the BRANN method plus a flowchart for producing a BRANN-QSAR model. Some results of the use of BRANNs on a number of data sets are illustrated and compared with other linear and nonlinear models.
We describe the use of Bayesian regularized artificial neural networks (BRANNs) in the development of QSAR models. These networks have the potential to solve a number of problems which arise in QSAR modeling such as: choice of model; robustness of model; choice of validation set; size of validation effort; and optimization of network architecture. The application of the methods to QSAR of compounds active at the benzodiazepine and muscarinic receptors is illustrated.
The Materials Genome is in action: the molecular codes for millions of materials have been sequenced, predictive models have been developed, and now the challenge of hydrogen storage is targeted. Renewably generated hydrogen is an attractive transportation fuel with zero carbon emissions, but its storage remains a significant challenge. Nanoporous adsorbents have shown promising physical adsorption of hydrogen approaching targeted capacities, but the scope of studies has remained limited. Here the Nanoporous Materials Genome, containing over 850 000 materials, is analyzed with a variety of computational tools to explore the limits of hydrogen storage. Optimal features that maximize net capacity at room temperature include pore sizes of around 6 Å and void fractions of 0.1, while at cryogenic temperatures pore sizes of 10 Å and void fractions of 0.5 are optimal. Our top candidates are found to be commercially attractive as “cryo-adsorbents”, with promising storage capacities at 77 K and 100 bar with 30% enhancement to 40 g/L, a promising alternative to liquefaction at 20 K and compression at 700 bar.
Neural networks continue to grow in importance for drug discovery. Recent developments in deep learning suggests further improvements may be gained in the analysis of large chemical data sets. It's anticipated that neural networks will be more widely used in drug discovery in the future, and applied in non-traditional areas such as drug delivery systems, biologically compatible materials, and regenerative medicine.
Change of chirality is a useful tool to manipulate the aqueous self-assembly behaviour of uncapped, hydrophobic tripeptides. In contrast with other short peptides, these tripeptides form hydrogels at a physiological pH without the aid of organic solvents or end-capping groups (e.g. Fmoc). The novel hydrogel forming peptide (D)Leu-Phe-Phe ((D)LFF) and its epimer Leu-Phe-Phe (LFF) exemplify dramatic supramolecular effects induced by subtle changes to stereochemistry. Only the d-amino acid-containing peptide instantly forms a hydrogel in aqueous solution following a pH switch, generating long fibres (>100 μm) that entangle into a 3D network. However, unexpected nanostructures are observed for both peptides and they are particularly heterogeneous for LFF. Structural analyses using CD, FT-IR and fluorescent amyloid staining reveal anti-parallel beta-sheets for both peptides. XRD analysis also identifies key distances consistent with beta-sheet formation in both peptides, but suggests additional high molecular order and extended molecular length for (D)LFF only. Molecular modelling of the two peptides highlights the key interactions responsible for self-assembly; in particular, rapid self-assembly of (D)LFF is promoted by a phenylalanine zipper, which is not possible because of steric factors for LFF. In conclusion, this study elucidates for the first time the molecular basis for how chirality can dramatically influence supramolecular organisation in very short peptide sequences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.