A computational approach is described that can predict the VD(ss) of new compounds in humans, with an accuracy of within 2-fold of the actual value. A dataset of VD values for 384 drugs in humans was used to train a hybrid mixture discriminant analysis-random forest (MDA-RF) model using 31 computed descriptors. Descriptors included terms describing lipophilicity, ionization, molecular volume, and various molecular fragments. For a test set of 23 proprietary compounds not used in model construction, the geometric mean fold-error (GMFE) was 1.78-fold (+/-11.4%). The model was also tested using a leave-class out approach wherein subsets of drugs based on therapeutic class were removed from the training set of 384, the model was recast, and the VD(ss) values for each of the subsets were predicted. GMFE values ranged from 1.46 to 2.94-fold, depending on the subset. Finally, for an additional set of 74 compounds, VD(ss) predictions made using the computational model were compared to predictions made using previously described methods dependent on animal pharmacokinetic data. Computational VD(ss) predictions were, on average, 2.13-fold different from the VD(ss) predictions from animal data. The computational model described can predict human VD(ss) with an accuracy comparable to predictions requiring substantially greater effort and can be applied in place of animal experimentation.
Linear discriminant analysis is used to generate models to classify multidrug-resistance reversal agents based on activity. Models are generated and evaluated using multidrug-resistance reversal activity values for 609 compounds measured using adriamycin-resistant P388 murine leukemia cells. Structure-based descriptors numerically encode molecular features which are used in model formation. Two types of models are generated: one type to classify compounds as inactive, moderately active, and active (three-class problem) and one type to classify compounds as inactive or active without considering the moderately active class (two-class problem). Two activity distributions are considered, where the separation between inactive and active compounds is different. When the separation between inactive and active classes is small, a model based on nine topological descriptors is developed that produces a classification rate of 83.1% correct for an external prediction set. Larger separation between active and inactive classes raises the prediction set classification rate to 92.0% correct using a model with six topological descriptors. Models are further validated through Monte Carlo experiments in which models are generated after class labels have been scrambled. The classification rates achieved demonstrate that the models developed could serve as a screening mechanism to identify potentially useful MDRR agents from large libraries of compounds.
The energetics of rotation around single bonds (torsions) is a key determinant of the three-dimensional shape that druglike molecules adopt in solution, the solid state, and in different biological environments, which in turn defines their unique physical and pharmacological properties. Therefore, accurate characterization of torsion angle preference and energetics is essential for the success of computational drug discovery and design. Here, we analyze torsional strain in crystal structures of druglike molecules in Cambridge structure database (CSD) and bioactive ligand conformations in protein data bank (PDB), expressing the total strain energy as a sum of strain energy from constituent rotatable bonds. We utilized cloud computing to generate torsion scan profiles of a very large collection of chemically diverse neutral fragments at DFT(B3LYP)/6-31G*//6-31G** or DFT(B3LYP)/6-31+G*//6-31+G** (for sulfur-containing molecule). With the data generated from these ab initio calculations, we performed rigorous analysis of strain due to deviation of observed torsion angles relative to their ideal gas-phase geometries. Contrary to the previous studies based on molecular mechanics, we find that in the crystalline state, molecules generally adopt low-strain conformations, with median per-torsion strain energy in CSD and PDB under one-tenth and one-third of a kcal/mol, respectively. However, for a small fraction (<5%) of motifs, external effects such as steric hindrance and hydrogen bonds result in strain penalty exceeding 2.5 kcal/mol. We find that due to poor quality of PDB structures in general, bioactive structures tend to have higher torsional strain compared to small-molecule crystal conformations. However, in the absence of structural fitting artifacts in PDB structures, protein-induced strain in bioactive conformations is quantitatively similar to those due to the packing forces in small-molecule crystal structures. This analysis allows us to establish strain energy thresholds to help identify biologically relevant conformers in a given ensemble. The work presented here is the most comprehensive study to date that demonstrates the utility and feasibility of gas-phase quantum mechanics (QM) calculations to study conformational preference and energetics of drug-size molecules. Potential applications of this study in computational lead discovery and structure-based design are discussed.
We introduce a class of partial atomic charge assignment method that provides ab initio quality description of the electrostatics of bioorganic molecules. The method uses a set of models that neither have a fixed functional form nor require a fixed set of parameters, and therefore are capable of capturing the complexities of the charge distribution in great detail. Random Forest regression is used to build separate charge models for elements H, C, N, O, F, S, and Cl, using training data consisting of partial charges along with a description of their surrounding chemical environments; training set charges are generated by fitting to the b3lyp/6-31G* electrostatic potential (ESP) and are subsequently refined to improve consistency and transferability of the charge assignments. Using a set of 210 neutral, small organic molecules, the absolute hydration free energy calculated using these charges in conjunction with Generalized Born solvation model shows a low mean unsigned error, close to 1 kcal/mol, from the experimental data. Using another large and independent test set of chemically diverse organic molecules, the method is shown to accurately reproduce charge-dependent observables--ESP and dipole moment--from ab initio calculations. The method presented here automatically provides an estimate of potential errors in the charge assignment, enabling systematic improvement of these models using additional data. This work has implications not only for the future development of charge models but also in developing methods to describe many other chemical properties that require accurate representation of the electronic structure of the system.
Quantitative structure-property relationships are developed using multiple linear regression and computational neural networks (CNNs). Structure-based descriptors are used to numerically encode molecular features that can be used to form models describing reaction rates with hydroxyl radicals. For a set of 57 unsaturated hydrocarbons, a 5-2-1 CNN was developed that produced a root-mean-square (rms) error of 0.0638 log units for the training set and 0.0657 log units for an external prediction set. The residual sum of squares for all 57 compounds was 0.234 log units, which compares very favorably with existing methodologies. Additionally, a 10-7-1 CNN was built to predict hydroxyl radical rate constants for a diverse set of 312 compounds. The training set rms error was 0.229 log units, and the rms error for the external prediction set was 0.254 log units. This model demonstrates the ability to provide accurate predictions over a wide range of functionalities.
Fragment Based Drug Discovery (FBDD) continues to advance as an efficient and alternative screening paradigm for the identification and optimization of novel chemical matter. To enable FBDD across a wide range of pharmaceutical targets, a fragment screening library is required to be chemically diverse and synthetically expandable to enable critical decision making for chemical follow-up and assessing new target druggability. In this manuscript, the Pfizer fragment library design strategy which utilized multiple and orthogonal metrics to incorporate structure, pharmacophore and pharmacological space diversity is described. Appropriate measures of molecular complexity were also employed to maximize the probability of detection of fragment hits using a variety of biophysical and biochemical screening methods. In addition, structural integrity, purity, solubility, fragment and analog availability as well as cost were important considerations in the selection process. Preliminary analysis of primary screening results for 13 targets using NMR Saturation Transfer Difference (STD) indicates the identification of uM-mM hits and the uniqueness of hits at weak binding affinities for these targets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.