A key metric to assess molecular docking remains ligand enrichment against challenging decoys. Whereas the directory of useful decoys (DUD) has been widely used, clear areas for optimization have emerged. Here we describe an improved benchmarking set that includes more diverse targets such as GPCRs and ion channels, totaling 102 proteins with 22886 clustered ligands drawn from ChEMBL, each with 50 property-matched decoys drawn from ZINC. To ensure chemotype diversity, we cluster each target’s ligands by their Bemis–Murcko atomic frameworks. We add net charge to the matched physicochemical properties and include only the most dissimilar decoys, by topology, from the ligands. An online automated tool () generates these improved matched decoys for user-supplied ligands. We test this data set by docking all 102 targets, using the results to improve the balance between ligand desolvation and electrostatics in DOCK 3.6. The complete DUD-E benchmarking set is freely available at .
Molecular docking remains an important tool for structure-based screening to find new ligands and chemical probes. As docking ambitions grow to include new scoring function terms, and to address ever more targets, the reliability and extendability of the orientation sampling, and the throughput of the method, become pressing. Here we explore sampling techniques that eliminate stochastic behavior in DOCK3.6, allowing us to optimize the method for regularly variable sampling of orientations. This also enabled a focused effort to optimize the code for efficiency, with a three-fold increase in the speed of the program. This, in turn, facilitated extensive testing of the method on the 102 targets, 22,805 ligands and 1,411,214 decoys of the Directory of Useful Decoys - Enhanced (DUD-E) benchmarking set, at multiple levels of sampling. Encouragingly, we observe that as sampling increases from 50 to 500 to 2000 to 5000 to 20000 molecular orientations in the binding site (and so from about 1×1010 to 4×1010 to 1×1011 to 2×1011 to 5×1011 mean atoms scored per target, since multiple conformations are sampled per orientation), the enrichment of ligands over decoys monotonically increases for most DUD-E targets. Meanwhile, including internal electrostatics in the evaluation ligand conformational energies, and restricting aromatic hydroxyls to low energy rotamers, further improved enrichment values. Several of the strategies used here to improve the efficiency of the code are broadly applicable in the field.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.