In recent years, machine learning (ML) methods have become increasingly popular in computational chemistry. After being trained on appropriate ab initio reference data, these methods allow to accurately predict the properties of chemical systems, circumventing the need for explicitly solving the electronic Schrödinger equation. Because of their computational efficiency and scalability to large datasets, deep neural networks (DNNs) are a particularly promising ML algorithm for chemical applications. This work introduces PhysNet, a DNN architecture designed for predicting energies, forces and dipole moments of chemical systems. PhysNet achieves state-of-the-art performance on the QM9, MD17 and ISO17 benchmarks. Further, two new datasets are generated in order to probe the performance of ML models for describing chemical reactions, long-range interactions, and condensed phase systems. It is shown that explicitly including electrostatics in energy predictions is crucial for a qualitatively correct description of the asymptotic regions of a potential energy surface (PES). PhysNet models trained on a systematically constructed set of small peptide fragments (at most eight heavy atoms) are able to generalize to considerably larger proteins like deca-alanine (Ala 10 ): The optimized geometry of helical Ala 10 predicted by PhysNet is virtually identical to ab initio results (RMSD = 0.21 Å). By running unbiased molecular dynamics (MD) simulations of Ala 10 on the PhysNet-PES in gas phase, it is found that instead of a helical structure, Ala 10 folds into a "wreath-shaped" configuration, which is more stable than the helical form by 0.46 kcal mol −1 according to the reference ab initio calculations.{r i }, all information necessary to determine E is contained in {Z i , r i }. Hence, there must exist an exact mapping f : {Z i , r i } → E, which is usually referred to as a potential energy surface (PES).
In recent years, the use of machine learning (ML) in computational chemistry has enabled numerous advances previously out of reach due to the computational complexity of traditional electronic-structure methods. One of the most promising applications is the construction of ML-based force fields (FFs), with the aim to narrow the gap between the accuracy of ab initio methods and the efficiency of classical FFs. The key idea is to learn the statistical relation between chemical structure and potential energy without relying on a preconceived notion of fixed chemical bonds or knowledge about the relevant interactions. Such universal ML approximations are in principle only limited by the quality and quantity of the reference data used to train them. This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them. The core concepts underlying ML-FFs are described in detail, and a step-by-step guide for constructing and testing them from scratch is given. The text concludes with a discussion of the challenges that remain to be overcome by the next generation of ML-FFs.
Machine-learned force fields combine the accuracy of ab initio methods with the efficiency of conventional force fields. However, current machine-learned force fields typically ignore electronic degrees of freedom, such as the total charge or spin state, and assume chemical locality, which is problematic when molecules have inconsistent electronic states, or when nonlocal effects play a significant role. This work introduces SpookyNet, a deep neural network for constructing machine-learned force fields with explicit treatment of electronic degrees of freedom and nonlocality, modeled via self-attention in a transformer architecture. Chemically meaningful inductive biases and analytical corrections built into the network architecture allow it to properly model physical limits. SpookyNet improves upon the current state-of-the-art (or achieves similar performance) on popular quantum chemistry data sets. Notably, it is able to generalize across chemical and conformational space and can leverage the learned chemical insights, e.g. by predicting unknown spin states, thus helping to close a further important remaining gap for today’s machine learning models in quantum chemistry.
In the early days of computation, slow processor speeds limited the amount of data that could be generated and used for scientific purposes. In the age of big data, the limiting factor usually is the method with which large amounts of data are analyzed and useful information is extracted. A typical example from chemistry are high-level ab initio calculations for small systems, which have nowadays become feasible even if energies at many different geometries are required. Molecular dynamics simulations often require several thousand distinct trajectories to be run. Under such circumstances suitable analytical representations of potential energy surfaces (PESs) based on ab initio calculations are required to propagate the dynamics at an acceptable cost. In this work we introduce a toolkit which allows the automatic construction of multidimensional PESs from gridded ab initio data based on reproducing kernel Hilbert space (RKHS) theory. The resulting representations require no tuning of parameters and allow energy and force evaluations at ab initio quality at the same cost as empirical force fields. Although the toolkit is primarily intended for constructing multidimensional potential energy surfaces for molecular systems, it can also be used for general machine learning purposes. The software is published under the MIT license and can be downloaded, modified, and used in other projects for free.
Despite the ever-increasing computer power, accurate ab initio calculations for large systems (thousands to millions of atoms) remain infeasible. Instead, approximate empirical energy functions are used. Most current approaches are either transferable between different chemical systems, but not particularly accurate, or they are fine-tuned to a specific application. In this work, a data-driven method to construct a potential energy surface based on neural networks is presented. Since the total energy is decomposed into local atomic contributions, the evaluation is easily parallelizable and scales linearly with system size. With prediction errors below 0.5 kcal mol for both unknown molecules and configurations, the method is accurate across chemical and configurational space, which is demonstrated by applying it to datasets from nonreactive and reactive molecular dynamics simulations and a diverse database of equilibrium structures. The possibility to use small molecules as reference data to predict larger structures is also explored. Since the descriptor only uses local information, high-level ab initio methods, which are computationally too expensive for large molecules, become feasible for generating the necessary reference data used to train the neural network.
The "in silico" exploration of chemical, physical and biological systems requires accurate and efficient energy functions to follow their nuclear dynamics at a molecular and atomistic level. Recently, machine learning tools gained a lot of attention in the field of molecular sciences and simulations and are increasingly used to investigate the dynamics of such systems. Among the various approaches, artificial neural networks (NNs) are one promising tool to learn a representation of potential energy surfaces. This is done by formulating the problem as a mapping from a set of atomic positions x and nuclear charges Z i to a potential energy V (x). Here, a fully-dimensional, reactive neural network representation for malonaldehyde (MA), acetoacetaldehyde (AAA) and acetylacetone (AcAc) is learned. It is used to run finite-temperature molecular dynamics simulations, and to determine the infrared spectra and the hydrogen transfer rates for the three molecules. The finite-temperature infrared spectrum for MA based on the NN learned on MP2 reference data provides a realistic representation of the lowfrequency modes and the H-transfer band whereas the CH vibrations are somewhat too high in frequency. For AAA it is demonstrated that the IR spectroscopy is sensitive to the position of the transferring hydrogen at either the OCH-or OCCH 3 end of the molecule. For the hydrogen transfer rates it is demonstrated that the O-O vibration (at ∼ 250 cm −1 ) is a gating mode and largely determines the rate at which the hydrogen is transferred between the donor and acceptor. Finally, possibilities to further improve such NN-based potential energy surfaces are explored. They include the transferability of an NN-learned energy function across chemical species (here methylation) and transfer learning from a lower level of reference data (MP2) to a higher level of theory (pair natural orbital-LCCSD(T)).
High-temperature, reactive gas flow is inherently nonequilibrium in terms of energy and state population distributions. Modeling such conditions is challenging even for the smallest molecular systems due to the extremely large number of accessible states and transitions between them. Here, neural networks (NNs) trained on explicitly simulated data are constructed and shown to provide quantitatively realistic descriptions which can be used in mesoscale simulation approaches such as Direct Simulation Monte Carlo to model gas flow at the hypersonic regime. As an example, the state-to-state cross sections for N(4S) + NO(2Π) → O(3P) + N2(X1Σg+) are computed from quasiclassical trajectory (QCT) simulations. By training NNs on a sparsely sampled noisy set of state-to-state cross sections, it is demonstrated that independently generated reference data are predicted with high accuracy. State-specific and total reaction rates as a function of temperature from the NN are in quantitative agreement with explicit QCT simulations and confirm earlier simulations, and the final state distributions of the vibrational and rotational energies agree as well. Thus, NNs trained on physical reference data can provide a viable alternative to computationally demanding explicit evaluation of the microscopic information at run time. This will considerably advance the ability to realistically model nonequilibrium ensembles for network-based simulations.
Global machine learning force fields, with the capacity to capture collective interactions in molecular systems, now scale up to a few dozen atoms due to considerable growth of model complexity with system size. For larger molecules, locality assumptions are introduced, with the consequence that nonlocal interactions are not described. Here, we develop an exact iterative approach to train global symmetric gradient domain machine learning (sGDML) force fields (FFs) for several hundred atoms, without resorting to any potentially uncontrolled approximations. All atomic degrees of freedom remain correlated in the global sGDML FF, allowing the accurate description of complex molecules and materials that present phenomena with far-reaching characteristic correlation lengths. We assess the accuracy and efficiency of sGDML on a newly developed MD22 benchmark dataset containing molecules from 42 to 370 atoms. The robustness of our approach is demonstrated in nanosecond path-integral molecular dynamics simulations for supramolecular complexes in the MD22 dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.