Quantitative Structure-Activity Relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss: (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists towards collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.
Abstract:One of the OECD principles for model validation requires defining the Applicability Domain (AD) for the QSAR models. This is important since the reliable predictions are generally limited to query chemicals structurally similar to the training compounds used to build the model. Therefore, characterization of interpolation space is significant in defining the AD and in this study some existing descriptor-based approaches performing this task are discussed and compared by implementing them on existing validated datasets from the literature. Algorithms adopted by different approaches allow defining the interpolation space in several ways, while defined thresholds contribute significantly to the extrapolations. For each dataset and approach implemented for this study, the comparison analysis was carried out by considering the model statistics and relative position of test set with respect to the training space.
This paper deals with the problem of evaluating the predictive ability of QSAR models and continues the discussion about proper estimates of the predictive ability from an external evaluation set reported in Schüürmann G., Ebert R.-U., et al. External Validation and Prediction Employing the Predictive Squared Correlation Coefficient--Test Set Activity Mean vs Training Set Activity Mean. J. Chem. Inf. Model. 2008, 48, 2140-2145 . The two formulas for calculating the predictive squared correlation coefficient Q2 previously discussed by Schüürmann et al. are one that adopted by the current OECD guidelines about QSAR validation and based on SS (sum of squares) of the external test set referring to the training set response mean and the other based on SS of the external test set referring to the test set response mean. In addition to these two formulas, another formula is evaluated here, based on SS referring to mean deviations of observed values from the training set mean over the training set instead of the external evaluation set.
Novel molecular descriptors based on a leverage matrix similar to that defined in statistics and usually used for regression diagnostics are presented. This leverage matrix, called Molecular Influence Matrix (MIM), is here proposed as a new molecular representation easily calculated from the spatial coordinates of the molecule atoms in a chosen conformation. The proposed molecular descriptors are called GETAWAY (GEometry, Topology, and Atom-Weights AssemblY) as they try to match 3D-molecular geometry provided by the molecular influence matrix and atom relatedness by molecular topology, with chemical information by using different atomic weightings (atomic mass, polarizability, van der Waals volume, and electronegativity, together with unit weights). A first set of molecular descriptors, called H-GETAWAY, is derived by using only the information provided by the molecular influence matrix, while a second set, called R-GETAWAY, combines this information with geometric interatomic distances in the molecule. The prediction ability in structure-property correlations of the new descriptors was tested by analyzing regressions of these descriptors for selected properties of octanes.
In a previous paper the theory of the new molecular descriptors called GETAWAY (GEometry, Topology, and Atom-Weights AssemblY) was explained. These descriptors have been proposed with the aim of matching 3D-molecular geometry, atom relatedness, and chemical information. In this paper prediction ability in structure-property correlations of GETAWAY descriptors has been tested extensively by analyzing the regressions of these descriptors for selected properties of some reference compound classes. Moreover, the general performance of the new descriptors in QSAR/QSPR has been evaluated with respect to other well-known sets of molecular descriptors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.