FieldScreen, a ligand-based Virtual Screening (VS) method, is described. Its use of 3D molecular fields makes it particularly suitable for scaffold hopping, and we have rigorously validated it for this purpose using a clustered version of the Directory of Useful Decoys (DUD). Using thirteen pharmaceutically relevant targets, we demonstrate that FieldScreen produces superior early chemotype enrichments, compared to DOCK. Additionally, hits retrieved by FieldScreen are consistently lower in molecular weight than those retrieved by docking. Where no X-ray protein structures are available, FieldScreen searches are more robust than docking into homology models or apo structures.
In this review, we highlight recent applications of machine learning to virtual screening, focusing on the use of supervised techniques to train statistical learning algorithms to prioritize databases of molecules as active against a particular protein target. Both ligand-based similarity searching and structure-based docking have benefited from machine learning algorithms, including naïve Bayesian classifiers, support vector machines, neural networks, and decision trees, as well as more traditional regression techniques. Effective application of these methodologies requires an appreciation of data preparation, validation, optimization, and search methodologies, and we also survey developments in these areas.
We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rigorous multiple comparison statistical tests, that these techniques can provide consistent improvements in predictive performance over single decision trees. However, within these methods, there is no clearly best-performing algorithm. This motivates a more in-depth investigation into the properties of random forests. We identify a set of parameters for the random forest that provide optimal performance across all the studied data sets. Additionally, the tree ensemble structure of the forest may provide an interpretable model, a considerable advantage over SVMs. We test this possibility and compare it with standard decision tree models.
Chemotype enrichment is increasingly recognized as an important measure of virtual screening performance. However, little attention has been paid to producing metrics which can quantify chemotype retrieval. Here, we examine two different protocols for analyzing chemotype retrieval: "cluster averaging", where the contribution of each active to the scoring metric is proportional to the number of other actives with the same chemotype, and "first found", where only the first active for a given chemotype contributes to the score. We demonstrate that this latter analysis, common in the qualitative analysis used in the current literature, has important drawbacks when combined with quantitative metrics.
Mass transport to micrometer-sized electrodes in a microjet (wall-tube) electrode configuration is examined experimentally and through finite element modeling. Electrochemical imaging experiments reveal that local mass transport is highly sensitive to the lateral position of the nozzle with respect to the electrode. When these two components are arranged coaxially, there is a pronounced minimum in the mass transfer rate to the electrode, as determined from transport-limited current measurements. Small lateral displacements of the nozzle from the coaxial position lead first to an increase in mass transport, with the current reaching a maximum at a displacement of around one nozzle radius (50 μm). For larger lateral displacements of the nozzle from the coaxial position, the limiting current gradually decreases with increasing distance. The implications of these observations for practical applications of the microjet electrode are considered. Voltammetric measurements on the oxidation of IrCl6 3- in aqueous solution, with the electrode and nozzle coaxial are shown to be in good agreement with simulation of mass transport. Increasing the solution viscosity dramatically decreases mass transport to the electrode, with the reduction in the diffusion coefficient of the redox species as the major factor.
We consider Bayesian methodology for comparing two or more unlabeled point sets. Application of the technique to a set of steroid molecules illustrates its potential utility involving the comparison of molecules in chemoinformatics and bioinformatics. We initially match a pair of molecules, where one molecule is regarded as random and the other fixed. A type of mixture model is proposed for the point set coordinates, and the parameters of the distribution are a labeling matrix (indicating which pairs of points match) and a concentration parameter. An important property of the likelihood is that it is invariant under rotations and translations of the data. Bayesian inference for the parameters is carried out using Markov chain Monte Carlo simulation, and it is demonstrated that the procedure works well on the steroid data. The posterior distribution is difficult to simulate from, due to multiple local modes, and we also use additional data (partial charges on atoms) to help with this task. An approximation is considered for speeding up the simulation algorithm, and the approximating fast algorithm leads to essentially identical inference to that under the exact method for our data. Extensions to multiple molecule alignment are also introduced, and an algorithm is described which also works well on the steroid data set. After all the steroid molecules have been matched, exploratory data analysis is carried out to examine which molecules are similar. Also, further Bayesian inference for the multiple alignment problem is considered.
Quantitative Structure-Selectivity Relationships (QSSR) are developed for a library of 40 phase-transfer asymmetric catalysts, based around quaternary ammonium salts, using Comparative Molecular Field Analysis (CoMFA) and closely related variants. Due to the flexibility of these catalysts, we use molecular dynamics (MD) with an implicit Generalized Born solvent model to explore their conformational space. Comparison with crystal data indicates that relevant conformations are obtained and that, furthermore, the correct biphenyl twist conformation is predicted, as illustrated by the superiority of the resulting model (leave-one-out q(2) = 0.78) compared to a random choice of low-energy conformations for each catalyst (average q(2) = 0.22). We extend this model by incorporating the MD trajectory directly into a 4D QSSR and by Boltzmann-weighting the contribution of selected minimized conformations, which we refer to as '3.5D' QSSR. The latter method improves on the predictive ability of the 3D QSSR (leave-one-out q(2) = 0.83), as confirmed by repeated training/test splits.
We report a general numerical strategy for the simulation of the hydrodynamics and mass transfer at a wall tube electrode based on finite element modeling. Previous empirical and approximate analytical treatments predicting the current as a function of flow rate and cell geometry are critically assessed and shown to have limited general applicability. Good agreement between experiment and simulation is found, providing a rigorous basis for future work with these electrodes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.