The applications of machine learning techniques to chemistry and materials science become more numerous by the day. The main challenge is to devise representations of atomic systems that are at the same time complete and concise, so as to reduce the number of reference calculations that are needed to predict the properties of different types of materials reliably. This has led to a proliferation of alternative ways to convert an atomic structure into an input for a machine-learning model. We introduce an abstract definition of chemical environments that is based on a smoothed atomic density, using a bra-ket notation to emphasize basis set independence and to highlight the connections with some popular choices of representations for describing atomic systems. The correlations between the spatial distribution of atoms and their chemical identities are computed as inner products between these feature kets, which can be given an explicit representation in terms of the expansion of the atom density on orthogonal basis functions, that is equivalent to the smooth overlap of atomic positions (SOAP) power spectrum, but also in real space, corresponding to n-body correlations of the atom density. This formalism lays the foundations for a more systematic tuning of the behavior of the representations, by introducing operators that represent the correlations between structure, composition, and the target properties. It provides a unifying picture of recent developments in the field and indicates a way forward towards more effective and computationally affordable machine-learning schemes for molecules and materials. arXiv:1807.00408v3 [physics.chem-ph]
We show that a single change in the derivation of the linearized semiclassical-initial value representation (LSC-IVR or 'classical Wigner approximation') results in a classical dynamics which conserves the quantum Boltzmann distribution. We rederive the (standard) LSC-IVR approach by writing the (exact) quantum time-correlation function in terms of the normal modes of a free ring-polymer (i.e. a discrete imaginary-time Feynman path), taking the limit that the number of polymer beads N → ∞, such that the lowest normal-mode frequencies take their 'Matsubara' values. The change we propose is to truncate the quantum Liouvillian, not explicitly in powers of 2 at 0 (which gives back the standard LSC-IVR approximation), but in the normalmode derivatives corresponding to the lowest Matsubara frequencies. The resulting 'Matsubara' dynamics is inherently classical (since all terms O( 2 ) disappear from the Matsubara Liouvillian in the limit N → ∞), and conserves the quantum Boltzmann distribution because the Matsubara Hamiltonian is symmetric with respect to imaginary-time translation. Numerical tests show that the Matsubara approximation to the quantum timecorrelation function converges with respect to the number of modes, and gives better agreement than LSC-IVR with the exact quantum result. Matsubara dynamics is too computationally expensive to be applied to complex systems, but its further approximation may lead to practical methods.
We recently obtained a quantum-Boltzmann-conserving classical dynamics by making a single change to the derivation of the "Classical Wigner" approximation. Here, we show that the further approximation of this "Matsubara dynamics" gives rise to two popular heuristic methods for treating quantum Boltzmann time-correlation functions: centroid molecular dynamics (CMD) and ring-polymer molecular dynamics (RPMD). We show that CMD is a mean-field approximation to Matsubara dynamics, obtained by discarding (classical) fluctuations around the centroid, and that RPMD is the result of discarding a term in the Matsubara Liouvillian which shifts the frequencies of these fluctuations. These findings are consistent with previous numerical results and give explicit formulae for the terms that CMD and RPMD leave out.
Please refer to published version for the most recent bibliographic citation information. If a published version is known of, the repository item page linked to above, will contain details on accessing it.
We present a scheme to obtain an inexpensive and reliable estimate of the uncertainty associated with the predictions of a machine-learning model of atomic and molecular properties. The scheme is based on resampling, with multiple models being generated based on sub-sampling of the same training data. The accuracy of the uncertainty prediction can be benchmarked by maximum likelihood estimation, which can also be used to correct for correlations between resampled models, and to improve the performance of the uncertainty estimation by a cross-validation procedure. In the case of sparse Gaussian Process Regression models, this resampled estimator can be evaluated at negligible cost. We demonstrate the reliability of these estimates for the prediction of molecular energetics, and for the estimation of nuclear chemical shieldings in molecular crystals. Extension to estimate the uncertainty in energy differences, forces, or other correlated predictions is straightforward. This method can be easily applied to other machine learning schemes, and will be beneficial to make data-driven 1 arXiv:1809.07653v1 [physics.chem-ph] 20 Sep 2018 predictions more reliable, and to facilitate training-set optimization and active-learning strategies.
By representing elements as points in a low-dimensional chemical space it is possible to improve the performance of a machine-learning model for a chemically-diverse dataset. The resulting coordinates are reminiscent of the main groups of the periodic table.
We develop a path-integral dynamics method for water that resembles centroid molecular dynamics (CMD), except that the centroids are averages of curvilinear, rather than cartesian, bead coordinates. The curvilinear coordinates are used explicitly only when computing the potential of mean force, the components of which are re-expressed in terms of cartesian 'quasi-centroids' (so-called because they are close to the cartesian centroids). Cartesian equations of motion are obtained by making small approximations to the quantum Boltzmann distribution. Simulations of the infrared spectra of various water models over 150-600 K show these approximations to be justified: for a two-dimensional OH-bond model, the quasi-centroid molecular dynamics (QCMD) spectra lie close to the exact quantum spectra, and almost on top of the Matsubara dynamics spectra; for gas-phase water, the QCMD spectra are close to the exact quantum spectra; for liquid water and ice (using the q-TIP4P/F surface), the QCMD spectra are close to the CMD spectra at 600 K, and line up with the results of thermostatted ring-polymer molecular dynamics and approximate quantum calculations at 300 and 150 K. The QCMD spectra show no sign of the CMD 'curvature problem' (of erroneous red shifts and broadening). In the liquid and ice simulations, the potential of mean force was evaluated onthe-fly by generalising an adiabatic CMD algorithm to curvilinear coordinates; the full limit of adiabatic separation needed to be taken, which made the QCMD calculations 8 times more expensive than partially adiabatic CMD at 300 K, and 32 times at 150 K (and the intensities may still not be converged at this temperature). The QCMD method is probably generalisable to many other systems, provided collective bead-coordinates can be identified that yield compact mean-field ring-polymer distributions.
Physically motivated and mathematically robust atom-centered representations of molecular structures are key to the success of modern atomistic machine learning. They lie at the foundation of a wide range of methods to predict the properties of both materials and molecules and to explore and visualize their chemical structures and compositions. Recently, it has become clear that many of the most effective representations share a fundamental formal connection. They can all be expressed as a discretization of n-body correlation functions of the local atom density, suggesting the opportunity of standardizing and, more importantly, optimizing their evaluation. We present an implementation, named librascal, whose modular design lends itself both to developing refinements to the density-based formalism and to rapid prototyping for new developments of rotationally equivariant atomistic representations. As an example, we discuss smooth overlap of atomic position (SOAP) features, perhaps the most widely used member of this family of representations, to show how the expansion of the local density can be optimized for any choice of radial basis sets. We discuss the representation in the context of a kernel ridge regression model, commonly used with SOAP features, and analyze how the computational effort scales for each of the individual steps of the calculation. By applying data reduction techniques in feature space, we show how to reduce the total computational cost by a factor of up to 4 without affecting the model's symmetry properties and without significantly impacting its accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.