Over the past decade, pharmaceutical companies have seen a decline in the number of drug candidates successfully passing through clinical trials, though billions are still spent on drug development. Poor aqueous solubility leads to low bio-availability, reducing pharmaceutical effectiveness. The human cost of inefficient drug candidate testing is of great medical concern, with fewer drugs making it to the production line, slowing the development of new treatments. In biochemistry and biophysics, water mediated reactions and interactions within active sites and protein pockets are an active area of research, in which methods for modelling solvated systems are continually pushed to their limits. Here, we discuss a multitude of methods aimed towards solvent modelling and solubility prediction, aiming to inform the reader of the options available, and outlining the various advantages and disadvantages of each approach.
We demonstrate that the intrinsic aqueous solubility of crystalline druglike molecules can be estimated with reasonable accuracy from sublimation free energies calculated using crystal lattice simulations and hydration free energies calculated using the 3D Reference Interaction Site Model (3D-RISM) of the Integral Equation Theory of Molecular Liquids (IET). The solubilities of 25 crystalline druglike molecules taken from different chemical classes are predicted by the model with a correlation coefficient of R = 0.85 and a root mean square error (RMSE) equal to 1.45 log10S units, which is significantly more accurate than results obtained using implicit continuum solvent models. The method is not directly parametrized against experimental solubility data, and it offers a full computational characterization of the thermodynamics of transfer of the drug molecule from crystal phase to gas phase to dilute aqueous solution.
We present four models of solution free-energy prediction for druglike molecules utilizing cheminformatics descriptors and theoretically calculated thermodynamic values. We make predictions of solution free energy using physics-based theory alone and using machine learning/quantitative structure–property relationship (QSPR) models. We also develop machine learning models where the theoretical energies and cheminformatics descriptors are used as combined input. These models are used to predict solvation free energy. While direct theoretical calculation does not give accurate results in this approach, machine learning is able to give predictions with a root mean squared error (RMSE) of ∼1.1 log S units in a 10-fold cross-validation for our Drug-Like-Solubility-100 (DLS-100) dataset of 100 druglike molecules. We find that a model built using energy terms from our theoretical methodology as descriptors is marginally less predictive than one built on Chemistry Development Kit (CDK) descriptors. Combining both sets of descriptors allows a further but very modest improvement in the predictions. However, in some cases, this is a statistically significant enhancement. These results suggest that there is little complementarity between the chemical information provided by these two sets of descriptors, despite their different sources and methods of calculation. Our machine learning models are also able to predict the well-known Solubility Challenge dataset with an RMSE value of 0.9–1.0 log S units.
The Interacting Quantum Atoms (IQA) method is used to analyze the correlated part of the Møller-Plesset (MP) perturbation theory two-particle density matrix. Such an analysis determines the effects of electron correlation within atoms and between atoms, which covers both bonds and nonbonded through-space atom-atom interactions within a molecule or molecular complex. Electron correlation lowers the energy of the atoms at either end of a bond, but for the bond itself, it can be stabilizing or destabilizing. Bonds are described in a two-dimensional world of exchange and charge transfer, where covalency is not the opposite of ionicity.
We wished to compile a data set of results from the experimental literature to support the development and validation of accurate computational models (force fields) for an important class of micelle-forming nonionic surfactant compounds, the poly(ethylene oxide) alkyl ethers, usually denoted C n E m . However, careful examination of the experimental literature exposed a striking degree of variation in values reported for critical micelle concentrations (cmc) and mean aggregation numbers (N agg ). This variation was so large that it masked important trends known to exist within this family of molecules, thereby rendering most of the literature data to be of limited utility for force field development. In this work, we describe some reasons for the wide variability in the experimental literature, and we present a set of cmc and aggregation number data for 12 C n E m compounds that we feel is appropriate to use for the construction of and validation of computational models. The cmc values we selected are from the existing experimental literature and represent a carefully chosen and consistent subset that conveys important trends seen by many of the experimental studies. However, for a corresponding and consistent set of weight-averaged aggregation numbers, we needed to perform new dynamic light scattering (DLS) experiments. The results of these experiments were carefully analyzed to obtain not just mean aggregation numbers but also the underlying micelle size distribution functions. Several trends observed in the cmc and N agg observables are highlighted and serve as challenges for developers of force field and simulation methodology. The analysis of the DLS experiments accounts for the fact that a broad distribution of micelle sizes exists for many of these compounds and that one must be careful to use the appropriate weighted averages (e.g., mass-weighted vs number-weighted averages) in comparing results from different types of experiments and in comparing results from experiments with those from simulations.
We compare a range of computational methods for the prediction of sublimation thermodynamics (enthalpy, entropy and free energy of sublimation). These include a model from theoretical chemistry that utilizes crystal lattice energy minimization (with the DMACRYS program) and QSPR models generated by both machine learning (Random Forest and Support Vector Machines) and regression (Partial Least Squares) methods. Using these methods we investigate the predictability of the enthalpy, entropy and free energy of sublimation, with consideration of whether such a method may be able to improve solubility prediction schemes. Previous work has suggested that the major source of error in solubility prediction schemes involving a thermodynamic cycle via the solid state is in the modeling of the free energy change away from the solid state. Yet contrary to this conclusion other work has found that the inclusion of terms such as the enthalpy of sublimation in QSPR methods 2 does not improve the predictions of solubility. We suggest the use of theoretical chemistry terms, detailed explicitly in the methods section, as descriptors for the prediction of the enthalpy and free energy of sublimation. A dataset of 158 molecules with experimental sublimation thermodynamics values and some CSD refcodes has been collected from the literature and is provided with their original source references.
We present an innovative method for predicting the dynamic electron correlation energy of an atom or a bond in a molecule utilizing topological atoms. Our approach uses the machine learning method Kriging (Gaussian Process Regression with a non-zero mean function) to predict these dynamic electron correlation energy contributions. The true energy values are calculated by partitioning the MP2 two-particle density-matrix via the Interacting Quantum Atoms (IQA) procedure. To our knowledge, this is the first time such energies have been predicted by a machine learning technique. We present here three important proof-of-concept cases: the water monomer, the water dimer, and the van der Waals complex H···He. These cases represent the final step toward the design of a full IQA potential for molecular simulation. This final piece will enable us to consider situations in which dispersion is the dominant intermolecular interaction. The results from these examples suggest a new method by which dispersion potentials for molecular simulation can be generated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.