We validate an automated implementation of a combined Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) method (VSGB 2.0 energy model) on a large and diverse selection of protein-ligand complexes (855 complexes). Although this data set is diverse with respect to both protein families and ligands, after carefully removing flawed structures, a significant correlation (R(2) = 0.63) between calculated and experimental binding affinities is obtained. Consistent explanations for "outlier" complexes are found. Visual analysis of the crystal structures and recourse to the original literature reveal that neglect of explicit solvent, ligand strain, and entropy contribute to the under- and overestimation of computed affinities. The limits of the Molecular Mechanics/Implicit Solvent approach to accurately estimate protein-ligand binding affinities is discussed as is the influence of the quality of protein-ligand complexes on computed free energy binding values.
There is a tendency in the literature to be critical of scoring functions when docking programs perform poorly. The assumption is that existing scoring functions need to be enhanced or new ones developed in order to improve the performance of docking programs for tasks such as pose prediction and virtual screening. However, failures can result from either sampling or scoring (or a combination of the two), although less emphasis tends to be given to the former. In this work, we use the programs GOLD and Glide on a high-quality data set to explore whether failures in pose prediction and binding affinity estimation can be attributable more to sampling or scoring. We show that identification of the correct pose (docking power) can be improved by incorporating ligand strain into the scoring function or rescoring an ensemble of diverse docking poses with MM-GBSA in a postprocessing step. We explore the use of nondefault docking settings and find that enhancing ligand sampling also improves docking power, again suggesting that sampling is more limiting than scoring for the docking programs investigated in this work. In cross-docking calculations (docking a ligand to a noncognate receptor structure) we observe a significant reduction in the accuracy of pose ranking, as expected and has been reported by others; however, we demonstrate that these alternate poses may in fact be more complementary between the ligand and the rigid receptor conformation, emphasizing that treating the receptor rigidly is an artificial constraint on the docking problem. We simulate protein flexibility by the use of multiple crystallographic conformations of a protein and demonstrate that docking results can be improved with this level of protein sampling. This work indicates the need for better sampling in docking programs, especially for the receptor. This study also highlights the variable descriptive value of RMSD as the sole arbiter of pose replication quality. It is shown that ligand poses within 2 Å of the crystallographic one can show dramatic differences in calculated relative protein-ligand energies. MM-GBSA rescoring of distinct poses overcomes some of the sensitivities of pose ranking experienced by the docking scoring functions due to protein preparation and binding site definition.
Using classification (SOM, LVQ, Binary, Decision Tree) and regression algorithms (PLS, BRANN, k-NN, Linear), this paper details the building of eight 2D-QSAR models from a 266 COX-2 inhibitor training set. The predictive performances of these eight models were subsequently compared using an 88 COX-2 inhibitor test set. Each ligand is described by 52 2D descriptors expressed as van der Waals Surface Areas (P_VSA) and its COX-2 binding IC50. One of our best predictive models is the neural network model (BRANN), which is able to select a subset, from the 88 ligand test set, that contains 94% COX-2 active inhibitors (pIC50>7.5) and detects 71% of all the actives. We then introduce a QSAR consensus prediction protocol that is shown to be more predictive than any single QSAR model: our C3 consensus approach is able to select a subset from the 88 ligand test set that contains 94% active inhibitors and 83% of all the actives. The 2D QSAR consensus protocol was finally applied to the high-throughput virtual screening of the NCI database, containing 193,477 organic compounds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.