Recent modifications and improvements to standard nucleic acid force fields have attempted to fix problems and issues that have been observed as longer timescale simulations have become routine. Although previous work has shown the ability to fold the UUCG stem-loop structure, until now no group has attempted to quantify the performance of current force fields using highly converged structural populations of the tetraloop conformational ensemble. In this study, we report the use of multiple independent sets of multidimensional replica exchange molecular dynamics (M-REMD) simulations with different initial conditions to generate well-converged conformational ensembles for the tetranucleotides r(GACC) and r(CCCC), as well as the larger UUCG tetraloop motif. By generating what is to our knowledge the most complete RNA structure ensembles reported to date for these systems, we remove the coupling between force field errors and errors due to incomplete sampling, providing a comprehensive comparison between current top-performing MD force fields for RNA. Of the RNA force fields tested in this study, none demonstrate the ability to correctly identify the most thermodynamically stable structure for all three systems. We discuss the deficiencies present in each potential function and suggest areas where improvements can be made. The results imply that although "short" (nsec-μsec timescale) simulations may stay close to their respective experimental structures and may well reproduce experimental observables, inevitably the current force fields will populate alternative incorrect structures that are more stable than those observed via experiment.
We present a strategy for carrying out high-precision calculations of binding free energy and binding enthalpy values from molecular dynamics simulations with explicit solvent. The approach is used to calculate the thermodynamic profiles for binding of nine small molecule guests to either the cucurbit[7]uril (CB7) or β-cyclodextrin (βCD) host. For these systems, calculations using commodity hardware can yield binding free energy and binding enthalpy values with a precision of ∼0.5 kcal/mol (95% CI) in a matter of days. Crucially, the self-consistency of the approach is established by calculating the binding enthalpy directly, via end point potential energy calculations, and indirectly, via the temperature dependence of the binding free energy, i.e., by the van’t Hoff equation. Excellent agreement between the direct and van’t Hoff methods is demonstrated for both host–guest systems and an ion-pair model system for which particularly well-converged results are attainable. Additionally, we find that hydrogen mass repartitioning allows marked acceleration of the calculations with no discernible cost in precision or accuracy. Finally, we provide guidance for accurately assessing numerical uncertainty of the results in settings where complex correlations in the time series can pose challenges to statistical analysis. The routine nature and high precision of these binding calculations opens the possibility of including measured binding thermodynamics as target data in force field optimization so that simulations may be used to reliably interpret experimental data and guide molecular design.
A necessary step to properly assess and validate the performance of force fields for biomolecules is to exhaustively sample the accessible conformational space, which is challenging for large RNA structures. Given questions regarding the reliability of modeling RNA structure and dynamics with current methods, we have begun to use RNA tetranucleotides to evaluate force fields. These systems, though small, display considerable conformational variability and complete sampling with standard simulation methods remains challenging. Here we compare and discuss the performance of known variations of replica exchange molecular dynamics (REMD) methods, specifically temperature REMD (T-REMD), Hamiltonian REMD (H-REMD), and multidimensional REMD (M-REMD) methods, which have been implemented in Amber’s accelerated GPU code. Using two independent simulations, we show that M-REMD not only makes very efficient use of emerging large-scale GPU clusters, like Blue Waters at the University of Illinois, but also is critically important in generating the converged ensemble more efficiently than either T-REMD or H-REMD. With 57.6 μs aggregate sampling of a conformational ensemble with M-REMD methods, the populations can be compared to NMR data to evaluate force field reliability and further understand how putative changes to the force field may alter populations to be in more consistent agreement with experiment.
The ability to computationally predict protein-small molecule binding affinities with high accuracy would accelerate drug discovery and reduce its cost by eliminating rounds of trial-and-error synthesis and experimental evaluation of candidate ligands. As academic and industrial groups work toward this capability, there is an ongoing need for datasets that can be used to rigorously test new computational methods. Although protein–ligand data are clearly important for this purpose, their size and complexity make it difficult to obtain well-converged results and to troubleshoot computational methods. Host–guest systems offer a valuable alternative class of test cases, as they exemplify noncovalent molecular recognition but are far smaller and simpler. As a consequence, host–guest systems have been part of the prior two rounds of SAMPL prediction exercises, and they also figure in the present SAMPL5 round. In addition to being blinded, and thus avoiding biases that may arise in retrospective studies, the SAMPL challenges have the merit of focusing multiple researchers on a common set of molecular systems, so that methods may be compared and ideas exchanged. The present paper provides an overview of the host–guest component of SAMPL5, which centers on three different hosts, two octa-acids and a glycoluril-based molecular clip, and two different sets of guest molecules, in aqueous solution. A range of methods were applied, including electronic structure calculations with implicit solvent models; methods that combine empirical force fields with implicit solvent models; and explicit solvent free energy simulations. The most reliable methods tend to fall in the latter class, consistent with results in prior SAMPL rounds, but the level of accuracy is still below that sought for reliable computer-aided drug design. Advances in force field accuracy, modeling of protonation equilibria, electronic structure methods, and solvent models, hold promise for future improvements.
Approaches for computing small molecule binding free energies based on molecular simulations are now regularly being employed by academic and industry practitioners to study receptor-ligand systems and prioritize the synthesis of small molecules for ligand design. Given the variety of methods and implementations available, it is natural to ask how the convergence rates and final predictions of these methods compare. In this study, we describe the concept and results for the SAMPL6 SAMPLing challenge, the first challenge from the SAMPL series focusing on the assessment of convergence properties and reproducibility of binding free energy methodologies. We provided parameter files, partial charges, and multiple initial geometries for two octa-acid (OA) and one cucurbit[8]uril (CB8) host-guest systems. Participants submitted binding free energy predictions as a function of the number of force and energy evaluations for seven different alchemical and physical-pathway (i.e., potential of mean force and weighted ensemble of trajectories) methodologies implemented with the GROMACS, AMBER, NAMD, or OpenMM simulation engines. To rank the methods, we developed an efficiency statistic based on bias and variance of the free energy estimates. For the two small OA binders, the free energy estimates computed with alchemical and potential of mean force approaches show relatively similar variance and bias as a function of the number of energy/force evaluations, with the attach-pull-release (APR), GROMACS expanded ensemble, and NAMD double decoupling submissions obtaining the greatest efficiency. The differences between the methods increase when analyzing the CB8-quinine system, where both the guest size and correlation times for system dynamics are greater. For this system, nonequilibrium switching (GROMACS/NS-DS/SB) obtained the overall highest efficiency. Surprisingly, the results suggest that specifying force field parameters and partial charges is insufficient to generally ensure reproducibility, and we observe differences between seemingly converged predictions ranging approximately from 0.3 to 1.0 kcal/mol, even with almost identical simulations parameters and system setup (e.g., Lennard-Jones cutoff, ionic composition). Further work will be required to completely identify the exact source of these discrepancies. Among the conclusions emerging from the data, we found that Hamiltonian replica exchange-while displaying very small variance-can be affected by a slowly-decaying bias that depends on the initial population of the replicas, that bidirectional estimators are significantly more efficient than unidirectional estimators for nonequilibrium free energy calculations for systems considered, and that the Berendsen barostat introduces non-negligible artifacts in expanded ensemble simulations.
We used microsecond time scale molecular dynamics simulations to compute, at high precision, binding enthalpies for cucurbit[7]uril (CB7) with eight guests in aqueous solution. The results correlate well with experimental data from previously published isothermal titration calorimetry studies, and decomposition of the computed binding enthalpies by interaction type provides plausible mechanistic insights. Thus, dispersion interactions appear to play a key role in stabilizing these complexes, due at least in part to the fact that their packing density is greater than that of water. On the other hand, strongly favorable Coulombic interactions between the host and guests are compensated by unfavorable solvent contributions, leaving relatively modest electrostatic contributions to the binding enthalpies. The better steric fit of the aliphatic guests into the circular host appears to explain why their binding enthalpies tend to be more favorable than those of the more planar aromatic guests. The present calculations also bear on the validity of the simulation force field. Somewhat unexpectedly, the TIP3P water yields better agreement with experiment than the TIP4P-Ew water model, although the latter is known to replicate the properties of pure water more accurately. More broadly, the present results demonstrate the potential for computational calorimetry to provide atomistic explanations for thermodynamic observations.
Molecular dynamics force field development and assessment requires a reliable means for obtaining a well-converged conformational ensemble of a molecule in both a time-efficient and cost-effective manner. This remains a challenge for RNA because its rugged energy landscape results in slow conformational sampling and accurate results typically require explicit solvent which increases computational cost. To address this, we performed both traditional and modified replica exchange molecular dynamics simulations on a test system (alanine dipeptide) and an RNA tetramer known to populate A-form-like conformations in solution (single-stranded rGACC). A key focus is on providing the means to demonstrate that convergence is obtained, for example by investigating replica RMSD profiles and/or detailed ensemble analysis through clustering. We found that traditional replica exchange simulations still require prohibitive time and resource expenditures, even when using GPU accelerated hardware, and our results are not well converged even at 2 microseconds of simulation time per replica. In contrast, a modified version of replica exchange, reservoir replica exchange in explicit solvent, showed much better convergence and proved to be both a cost-effective and reliable alternative to the traditional approach. We expect this method will be attractive for future research that requires quantitative conformational analysis from explicitly solvated simulations.
Bromodomains, protein domains involved in epigenetic regulation, are able to bind small molecules with high affinity. In the present study, we report free energy calculations for the binding of seven ligands to the first BRD4 bromodomain, using the attach-pull-release (APR) method to compute the reversible work of removing the ligands from the binding site and then allowing the protein to relax conformationally. We test three different water models, TIP3P, TIP4PEw, and SPC/E, as well as the GAFF and GAFF2 parameter sets for the ligands. Our simulations show that the apo crystal structure of BRD4 is only metastable, with a structural transition happening in the absence of the ligand typically after 20 ns of simulation. We compute the free energy change for this transition with a separate APR calculation on the free protein and include its contribution to the ligand binding free energies, which generally causes an underestimation of the affinities. By testing different water models and ligand parameters, we are also able to assess their influence in our results and determine which one produces the best agreement with the experimental data. Both free energies associated with the conformational change and ligand binding are affected by the choice of water model, with the two sets of ligand parameters affecting their binding free energies to a lesser degree. Across all six combinations of water model and ligand potential function, the Pearson correlation coefficients between calculated and experimental binding free energies range from 0.55 to 0.83, and the root-mean-square errors range from 1.4–3.2 kcal/mol. The current protocol also yields encouraging preliminary results when used to assess the relative stability of ligand poses generated by docking or other methods, as illustrated for two different ligands. Our method takes advantage of the high performance provided by graphics processing units and can readily be applied to other ligands as well as other protein systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.