The intuitive way of chemists to communicate molecules is via two-dimensional structure diagrams. The straightforward visual representations are mostly preferred to the often complicated systematic chemical names. For chemical patterns, however, no comparable visualization standards have evolved so far. Chemical patterns denoting descriptions of chemical features are needed whenever a set of molecules is filtered for certain properties. The currently available representations are constrained to linear molecular pattern languages which are hardly human readable and therefore keep chemists without computational background from systematically formulating patterns. Therefore, we introduce a new visualization concept for chemical patterns. The common standard concept of structure diagrams is extended to account for property descriptions and logic combinations of chemical features in patterns. As a first application of the new concept, we developed the SMARTSviewer, a tool that converts chemical patterns encoded in SMARTS strings to a visual representation. The graphic pattern depiction provides an overview of the specified chemical features, variations, and similarities without needing to decode the often cryptic linear expressions. Taking recent chemical publications from various fields, we demonstrate the wide application range of a graphical chemical pattern language.
Protein-ligand interactions are the fundamental basis for molecular design in pharmaceutical research, biocatalysis, and agrochemical development. Especially hydrogen bonds are known to have special geometric requirements and therefore deserve a detailed analysis. In modeling approaches a more general description of hydrogen bond geometries, using distance and directionality, is applied. A first study of their geometries was performed based on 15 protein structures in 1982. Currently there are about 95 000 protein-ligand structures available in the PDB, providing a solid foundation for a new large-scale statistical analysis. Here, we report a comprehensive investigation of geometric and functional properties of hydrogen bonds. Out of 22 defined functional groups, eight are fully in accordance with theoretical predictions while 14 show variations from expected values. On the basis of these results, we derived interaction geometries to improve current computational models. It is expected that these observations will be useful in designing new chemical structures for biological applications.
Computational target prediction for bioactive compounds is a promising field in assessing off-target effects. Structure-based methods not only predict off-targets, but, simultaneously, binding modes, which are essential for understanding the mode of action and rationally designing selective compounds. Here, we highlight the current open challenges of computational target prediction methods based on protein structures and show why inverse screening rather than sequential pairwise protein-ligand docking methods are needed. A new inverse screening method based on triangle descriptors is introduced: iRAISE (inverse Rapid Index-based Screening Engine). A Scoring Cascade considering the reference ligand as well as the ligand and active site coverage is applied to overcome interprotein scoring noise of common protein-ligand scoring functions. Furthermore, a statistical evaluation of a score cutoff for each individual protein pocket is used. The ranking and binding mode prediction capabilities are evaluated on different datasets and compared to inverse docking and pharmacophore-based methods. On the Astex Diverse Set, iRAISE ranks more than 35% of the targets to the first position and predicts more than 80% of the binding modes with a root-mean-square deviation (RMSD) accuracy of <2.0 Å. With a median computing time of 5 s per protein, large amounts of protein structures can be screened rapidly. On a test set with 7915 protein structures and 117 query ligands, iRAISE predicts the first true positive in a ranked list among the top eight ranks (median), i.e., among 0.28% of the targets.
Structure-based computational target prediction methods identify potential targets for a bioactive compound. Methods based on protein-ligand docking so far face many challenges, where the greatest probably is the ranking of true targets in a large data set of protein structures. Currently, no standard data sets for evaluation exist, rendering comparison and demonstration of improvements of methods cumbersome. Therefore, we propose two data sets and evaluation strategies for a meaningful evaluation of new target prediction methods, i.e., a small data set consisting of three target classes for detailed proof-of-concept and selectivity studies and a large data set consisting of 7992 protein structures and 72 drug-like ligands allowing statistical evaluation with performance metrics on a drug-like chemical space. Both data sets are built from openly available resources, and any information needed to perform the described experiments is reported. We describe the composition of the data sets, the setup of screening experiments, and the evaluation strategy. Performance metrics capable to measure the early recognition of enrichments like AUC, BEDROC, and NSLR are proposed. We apply a sequence-based target prediction method to the large data set to analyze its content of nontrivial evaluation cases. The proposed data sets are used for method evaluation of our new inverse screening method iRAISE. The small data set reveals the method's capability and limitations to selectively distinguish between rather similar protein structures. The large data set simulates real target identification scenarios. iRAISE achieves in 55% excellent or good enrichment a median AUC of 0.67 and RMSDs below 2.0 Å for 74% and was able to predict the first true target in 59 out of 72 cases in the top 2% of the protein data set of about 8000 structures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.