Modern molecular mechanics force fields are widely used for modeling the dynamics and interactions of small organic molecules using libraries of transferable force field parameters. However, for molecules outside the training set, the parameters are potentially inaccurate and it may be preferable to derive molecule-specific parameters. Here we present an intuitive parameter derivation toolkit, QUBEKit (QUantum mechanical BEspoke Kit), which enables the automated generation of system-specific small molecule force field parameters directly from quantum mechanics. QUBEKit is written in python and combines bond, angle, torsion, charge, and Lennard-Jones parameter derivation methodologies alongside a method for deriving the positions and charges of off-center virtual sites from the partitioned quantum mechanical electron density. As a proof of concept, we have rederived a complete set of parameters for 109 small organic molecules and assessed the accuracy by comparing computed liquid properties with experiments. QUBEKit gives competitive results when compared to standard transferable force fields, with mean unsigned errors of 0.024 g/cm3, 0.79 kcal/mol, and 1.17 kcal/mol for the liquid density, heat of vaporization, and free energy of hydration, respectively. This indicates that the derived parameters are suitable for molecular modeling applications, including computer-aided drug design.
Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.
We introduce the Open Force Field (OpenFF) 2.0.0 small molecule force field for drug-like molecules, codenamed Sage, which builds upon our previous iteration, Parsley. OpenFF force fields are based on direct chemical perception, which generalizes easily to highly diverse sets of chemistries based on substructure queries. Like the previous OpenFF iterations, the Sage generation of OpenFF force fields was validated in protein− ligand simulations to be compatible with AMBER biopolymer force fields. In this work, we detail the methodology used to develop this force field, as well as the innovations and improvements introduced since the release of Parsley 1.0.0. One particularly significant feature of Sage is a set of improved Lennard-Jones (LJ) parameters retrained against condensed phase mixture data, the first refit of LJ parameters in the OpenFF small molecule force field line. Sage also includes valence parameters refit to a larger database of quantum chemical calculations than previous versions, as well as improvements in how this fitting is performed. Force field benchmarks show improvements in general metrics of performance against quantum chemistry reference data such as root-mean-square deviations (RMSD) of optimized conformer geometries, torsion fingerprint deviations (TFD), and improved relative conformer energetics (ΔΔE). We present a variety of benchmarks for these metrics against our previous force fields as well as in some cases other small molecule force fields. Sage also demonstrates improved performance in estimating physical properties, including comparison against experimental data from various thermodynamic databases for small molecule properties such as ΔH mix , ρ(x), ΔG solv , and ΔG trans . Additionally, we benchmarked against protein−ligand binding free energies (ΔG bind ), where Sage yields results statistically similar to previous force fields. All the data is made publicly available along with complete details on how to reproduce the training results at https://github.com/openforcefield/openff-sage.
The development of accurate transferable force fields is key to realizing the full potential of atomistic modeling in the study of biological processes such as protein–ligand binding for drug discovery. State-of-the-art transferable force fields, such as those produced by the Open Force Field Initiative, use modern software engineering and automation techniques to yield accuracy improvements. However, force field torsion parameters, which must account for many stereoelectronic and steric effects, are considered to be less transferable than other force field parameters and are therefore often targets for bespoke parametrization. Here, we present the Open Force Field QCSubmit and BespokeFit software packages that, when combined, facilitate the fitting of torsion parameters to quantum mechanical reference data at scale. We demonstrate the use of QCSubmit for simplifying the process of creating and archiving large numbers of quantum chemical calculations, by generating a dataset of 671 torsion scans for druglike fragments. We use BespokeFit to derive individual torsion parameters for each of these molecules, thereby reducing the root-mean-square error in the potential energy surface from 1.1 kcal/mol, using the original transferable force field, to 0.4 kcal/mol using the bespoke version. Furthermore, we employ the bespoke force fields to compute the relative binding free energies of a congeneric series of inhibitors of the TYK2 protein, and demonstrate further improvements in accuracy, compared to the base force field (MUE reduced from 0.560.39 0.77 to 0.420.28 0.59 kcal/mol and R 2 correlation improved from 0.720.35 0.87 to 0.930.84 0.97).
Community efforts in the computational molecular sciences (CMS) are evolving toward modular, open, and interoperable interfaces that work with existing community codes to provide more functionality and composability than could be achieved with a single program. The Quantum Chemistry Common Driver and Databases (QCDB) project provides such capability through an application programming interface (API) that facilitates interoperability across multiple quantum chemistry software packages. In tandem with the Molecular Sciences Software Institute and their Quantum Chemistry Archive ecosystem, the unique functionalities of several CMS programs are integrated, including CFOUR, GAMESS, NWChem, OpenMM, Psi4, Qcore, TeraChem, and Turbomole, to provide common computational functions, i.e., energy, gradient, and Hessian computations as well as molecular properties such as atomic charges and vibrational frequency analysis. Both standard users and power users benefit from adopting these APIs as they lower the language barrier of input styles and enable a standard layout of variables and data. These designs allow end-to-end interoperable programming of complex computations and provide best practices options by default.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.