With the ongoing rapid growth of publicly available ligand–protein bioactivity data, there is a trove of valuable data that can be used to train a plethora of machine-learning algorithms. However, not all data is equal in terms of size and quality and a significant portion of researchers’ time is needed to adapt the data to their needs. On top of that, finding the right data for a research question can often be a challenge on its own. To meet these challenges, we have constructed the Papyrus dataset. Papyrus is comprised of around 60 million data points. This dataset contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with several smaller datasets containing high-quality data. The aggregated data has been standardised and normalised in a manner that is suitable for machine learning. We show how data can be filtered in a variety of ways and also perform some examples of quantitative structure–activity relationship analyses and proteochemometric modelling. Our ambition is that this pruned data collection constitutes a benchmark set that can be used for constructing predictive models, while also providing an accessible data source for research.
Graphical Abstract
Drug discovery programs of covalent
irreversible, mechanism-based
enzyme inhibitors often focus on optimization of potency as determined
by IC50-values in biochemical assays. These assays do not
allow the characterization of the binding activity (Ki) and reactivity (kinact)
as individual kinetic parameters of the covalent inhibitors. Here,
we report the development of a kinetic substrate assay to study the
influence of the acidity (pKa) of heterocyclic
leaving group of triazole urea derivatives as diacylglycerol lipase
(DAGL)-α inhibitors. Surprisingly, we found that the reactivity
of the inhibitors did not correlate with the pKa of the leaving group, whereas the position of the nitrogen
atoms in the heterocyclic core determined to a large extent the binding
activity of the inhibitor. This finding was confirmed and clarified
by molecular dynamics simulations on the covalently bound Michaelis–Menten
complex. A deeper understanding of the binding properties of covalent
serine hydrolase inhibitors is expected to aid in the discovery and
development of more selective covalent inhibitors.
The version presented here may differ from the published version or from the version of the record. Please see the repository URL above for details on accessing the published version and note that access may require a subscription.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.