Over the last few years, many new machine learning-based scoring functions for predicting the binding of small molecules to proteins have been developed. Their objective is to approximate the distribution which takes two molecules as input and outputs the energy of their interaction. This distribution is dependent on the interatomic interactions involved in binding, and only a scoring function that accounts for these interactions can accurately predict binding affinity on unseen molecules. To try to create a method capable of learning these interactions, we built PointVS: a machine learning-based scoring function which achieves state of the art performance even after rigorous filtering of the training set. This filtering is key, as we found that a commonly used benchmark, CASF-16, overestimates the true accuracy of machine learning-based scoring functions when trained using the most commonly used training set. Ranking algorithms using this benchmark rewards memorisation of training data rather than knowledge of the rules of intermolecular binding. We demonstrate that PointVS is able to identify important interactions using attribution, and further, that it can be used to extract important binding pharmacophores from a given protein target when supplied with a number of bound structures. We use this information to perform fragment elaboration, and see improvements in docking scores compared to using structural information from a traditional data-based approach. This not only provides definitive proof that PointVS is learning to identify important binding interactions, but also constitutes the first deep learning-based method for extracting structural information from a target for molecule design.
A novel crystallographic fragment screening data set was generated and used in the SAMPL7 challenge for protein-ligands. The SAMPL challenges prospectively assess the predictive power of methods involved in computer-aided drug design. Application of various methods to fragment molecules are now widely used in the search for new drugs. However, there is little in the way of systematic validation specifically for fragment-based approaches. We have performed a large crystallographic high-throughput fragment screen against the therapeutically relevant second bromodomain of the Pleckstrin-homology domain interacting protein (PHIP2) that revealed 52 different fragments bound across 4 distinct sites, 47 of which were bound to the pharmacologically relevant acetylated lysine (Kac) binding site. These data were used to assess computational screening, binding pose prediction and follow-up enumeration. All submissions performed randomly for screening. Pose prediction success rates (defined as less than 2 Å root mean squared deviation against heavy atom crystal positions) ranged between 0 and 25% and only a very few follow-up compounds were deemed viable candidates from a medicinal-chemistry perspective based on a common molecular descriptors analysis. The tight deadlines imposed during the challenge led to a small number of submissions suggesting that the accuracy of rapidly responsive workflows remains limited. In addition, the application of these methods to reproduce crystallographic fragment data still appears to be very challenging. The results show that there is room for improvement in the development of computational tools particularly when applied to fragment-based drug design.
We demonstrate that a simple workflow of array synthesis, combining low-cost robotics with analytic techniques to deconvolute crude reaction mixtures, is an effective way to collect structural data on a binding site. Starting from the high information content of the crystallographic fragment screens on PHIP(2) (second bromodomain of the pleckstrin homology domain interacting protein), a collection of more than 1800 compounds was enumerated. Several thousand Crude Reaction Mixtures (CRMs) were synthesized on one robotic platform, an OpenTrons OT-1 liquid handler, using reaction sequences of up to 5 chemical steps. Analysis via MScheck, an algorithm-based system for finding an m/z in a CRM, significantly shortened product identification protocol times. 969 usable X-ray diffraction datasets were acquired, which resolved as 22 reaction products binding to the protein, 19 with conserved poses relative to the original fragment and 3 with a new, unexpected binding pose. The 22 crystallographic hit compounds were subsequently tested with peptide displacement alpha-screen assay and time-resolved grating-coupled interferometry-based biosensor assays, which confirmed one molecule with an IC50 = 34 μM and KD = 50 μM, from an inactive fragment. The procedures described are entirely formulaic and engineerable and the method is eminently scalable. We anticipate that this cheap, low solvent-use approach will yield vast amounts of data, enabling rapid structural SAR landscape exploration around fragments, leading to faster fragment-to-lead times.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.