Harold Grosjean scite author profile

Over the last few years, many new machine learning-based scoring functions for predicting the binding of small molecules to proteins have been developed. Their objective is to approximate the distribution which takes two molecules as input and outputs the energy of their interaction. This distribution is dependent on the interatomic interactions involved in binding, and only a scoring function that accounts for these interactions can accurately predict binding affinity on unseen molecules. To try to create a method capable of learning these interactions, we built PointVS: a machine learning-based scoring function which achieves state of the art performance even after rigorous filtering of the training set. This filtering is key, as we found that a commonly used benchmark, CASF-16, overestimates the true accuracy of machine learning-based scoring functions when trained using the most commonly used training set. Ranking algorithms using this benchmark rewards memorisation of training data rather than knowledge of the rules of intermolecular binding. We demonstrate that PointVS is able to identify important interactions using attribution, and further, that it can be used to extract important binding pharmacophores from a given protein target when supplied with a number of bound structures. We use this information to perform fragment elaboration, and see improvements in docking scores compared to using structural information from a traditional data-based approach. This not only provides definitive proof that PointVS is learning to identify important binding interactions, but also constitutes the first deep learning-based method for extracting structural information from a target for molecule design.

show abstract

SAMPL7 protein-ligand challenge: A community-wide evaluation of computational methods against fragment screening and pose-prediction

Grosjean

Işık

Aimon

et al. 2022

J Comput Aided Mol Des

View full text Add to dashboard Cite

A novel crystallographic fragment screening data set was generated and used in the SAMPL7 challenge for protein-ligands. The SAMPL challenges prospectively assess the predictive power of methods involved in computer-aided drug design. Application of various methods to fragment molecules are now widely used in the search for new drugs. However, there is little in the way of systematic validation specifically for fragment-based approaches. We have performed a large crystallographic high-throughput fragment screen against the therapeutically relevant second bromodomain of the Pleckstrin-homology domain interacting protein (PHIP2) that revealed 52 different fragments bound across 4 distinct sites, 47 of which were bound to the pharmacologically relevant acetylated lysine (Kac) binding site. These data were used to assess computational screening, binding pose prediction and follow-up enumeration. All submissions performed randomly for screening. Pose prediction success rates (defined as less than 2 Å root mean squared deviation against heavy atom crystal positions) ranged between 0 and 25% and only a very few follow-up compounds were deemed viable candidates from a medicinal-chemistry perspective based on a common molecular descriptors analysis. The tight deadlines imposed during the challenge led to a small number of submissions suggesting that the accuracy of rapidly responsive workflows remains limited. In addition, the application of these methods to reproduce crystallographic fragment data still appears to be very challenging. The results show that there is room for improvement in the development of computational tools particularly when applied to fragment-based drug design.

show abstract

High-throughput crystallography for rapid fragment growth from crude arrays by low-cost robotics

Grosjean

Aimon

Hassell-Hart

et al. 2023

Preprint

View full text Add to dashboard Cite

We demonstrate that a simple workflow of array synthesis, combining low-cost robotics with analytic techniques to deconvolute crude reaction mixtures, is an effective way to collect structural data on a binding site. Starting from the high information content of the crystallographic fragment screens on PHIP(2) (second bromodomain of the pleckstrin homology domain interacting protein), a collection of more than 1800 compounds was enumerated. Several thousand Crude Reaction Mixtures (CRMs) were synthesized on one robotic platform, an OpenTrons OT-1 liquid handler, using reaction sequences of up to 5 chemical steps. Analysis via MScheck, an algorithm-based system for finding an m/z in a CRM, significantly shortened product identification protocol times. 969 usable X-ray diffraction datasets were acquired, which resolved as 22 reaction products binding to the protein, 19 with conserved poses relative to the original fragment and 3 with a new, unexpected binding pose. The 22 crystallographic hit compounds were subsequently tested with peptide displacement alpha-screen assay and time-resolved grating-coupled interferometry-based biosensor assays, which confirmed one molecule with an IC50 = 34 μM and KD = 50 μM, from an inactive fragment. The procedures described are entirely formulaic and engineerable and the method is eminently scalable. We anticipate that this cheap, low solvent-use approach will yield vast amounts of data, enabling rapid structural SAR landscape exploration around fragments, leading to faster fragment-to-lead times.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Harold Grosjean

A Step Towards Generalisability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening

SAMPL7 protein-ligand challenge: A community-wide evaluation of computational methods against fragment screening and pose-prediction

High-throughput crystallography for rapid fragment growth from crude arrays by low-cost robotics

Contact Info

Product

Resources

About