Refined nearest neighbor analysis was recently introduced for the analysis of virtual screening benchmark data sets. It constitutes a technique from the field of spatial statistics and provides a mathematical framework for the nonparametric analysis of mapped point patterns. Here, refined nearest neighbor analysis is used to design benchmark data sets for virtual screening based on PubChem bioactivity data. A workflow is devised that purges data sets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies monitored by refined nearest neighbor analysis functions is applied to generate corresponding data sets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These data sets provide a tool for Maximum Unbiased Validation (MUV) of virtual screening methods. The data sets and a software package implementing the MUV design workflow are freely available at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.
A common finding of many reports evaluating ligand-based virtual screening methods is that validation results vary considerably with changing benchmark data sets. It is widely assumed that these data set specific effects are caused by the redundancy, self-similarity, and cluster structure inherent to those data sets. These phenomena manifest themselves in the data sets' representation in descriptor space, which is termed the data set topology. A methodology for the characterization of data set topology based on spatial statistics is introduced. The method is nonparametric and can deal with arbitrary distributions of descriptor values. With this methodology it is possible to associate differences in virtual screening performance on different data sets with differences in data set topology. Moreover, the better virtual screening performance of certain descriptors can be explained by their ability of representing the benchmark data sets by a more favorable topology. Finally it is shown, that the composition of some benchmark data sets causes topologies that lead to overoptimistic validation results even in very "simple" descriptor spaces. Spatial statistics analysis as proposed here facilitates the detection of such biased data sets and may provide a tool for the future design of unbiased benchmark data sets.
Nipah virus (NiV), a highly pathogenic paramyxovirus, causes respiratory disease in pigs and severe febrile encephalitis in humans with high mortality rates. On the basis of the structural similarity of viral fusion (F) proteins within the family Paramyxoviridae, we designed and tested 18 quinolone derivatives in a NiV and measles virus (MV) envelope protein-based fusion assay beside evaluation of cytotoxicity. We found five compounds successfully inhibiting NiV envelope protein-induced cell fusion. The most active molecules (19 and 20), which also inhibit the syncytium formation induced by infectious NiV and show a low cytotoxicity in Vero cells, represent a promising lead quinolone-type compound structure. Molecular modeling indicated that compound 19 fits well into a particular protein cavity present on the NiV F protein that is important for the fusion process.
A series of cis-configured epoxides and aziridines containing hydrophobic moieties and amino acid esters were synthesized as new potential inhibitors of the secreted aspartic protease 2 (SAP2) of Candida albicans. Enzyme assays revealed the N-benzyl-3-phenyl-substituted aziridines 11 and 17 as the most potent inhibitors, with second-order inhibition rate constants (k(2)) between 56,000 and 121,000 M(-1) min(-1). The compounds were shown to be pseudo-irreversible dual-mode inhibitors: the intermediate esterified enzyme resulting from nucleophilic ring opening was hydrolyzed and yielded amino alcohols as transition-state-mimetic reversible inhibitors. The results of docking studies with the ring-closed aziridine forms of the inhibitors suggest binding modes mainly dominated by hydrophobic interactions with the S1, S1', S2, and S2' subsites of the protease, and docking studies with the processed amino alcohol forms predict additional hydrogen bonds of the new hydroxy group to the active site Asp residues. C. albicans growth assays showed the compounds to decrease SAP2-dependent growth while not affecting SAP2-independent growth.
The rapid emergence of pesticide resistance has given rise to a demand for herbicides with new mode of action (MoA). In the agrochemical sector, with the availability of experimental high throughput screening (HTS) data, it is now possible to utilize in silico target prediction methods in the early discovery phase to suggest the MoA of a compound via data mining of bioactivity data. While having been established in the pharmaceutical context, in the agrochemical area this approach poses rather different challenges, as we have found in this work, partially due to different chemistry, but even more so due to different (usually smaller) amounts of data, and different ways of conducting HTS. With the aim to apply computational methods for facilitating herbicide target identification, 48,000 bioactivity data against 16 herbicide targets were processed to train Laplacian modified Naïve Bayesian (NB) classification models. The herbicide target prediction model ("HerbiMod") is an ensemble of 16 binary classification models which are evaluated by internal, external and prospective validation sets. In addition to the experimental inactives, 10,000 random agrochemical inactives were included in the training process, which showed to improve the overall balanced accuracy of our models up to 40%. For all the models, performance in terms of balanced accuracy of≥80% was achieved in five-fold cross validation. Ranking target predictions was addressed by means of z-scores which improved predictivity over using raw scores alone. An external testset of 247 compounds from ChEMBL and a prospective testset of 394 compounds from BASF SE tested against five well studied herbicide targets (ACC, ALS, HPPD, PDS and PROTOX) were used for further validation. Only 4% of the compounds in the external testset lied in the applicability domain and extrapolation (and correct prediction) was hence impossible, which on one hand was surprising, and on the other hand illustrated the utilization of using applicability domains in the first place. However, performance better than 60% in balanced accuracy was achieved on the prospective testset, where all the compounds fell within the applicability domain, and which hence underlines the possibility of using target prediction also in the area of agrochemicals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.