Evaluating the (dis)similarity of crystalline, disordered and molecular compounds is a critical step in the development of algorithms to navigate automatically the configuration space of complex materials. For instance, a structural similarity metric is crucial for classifying structures, searching chemical space for better compounds and materials, and to drive the next generation of machine-learning techniques for predicting the stability and properties of molecules and materials. In the last few years several strategies have been designed to compare atomic coordination environments. In particular, the Smooth Overlap of Atomic Positions (SOAP) has emerged as a natural framework to obtain translation, rotation and permutation-invariant descriptors of groups of atoms, driven by the design of various classes of machine-learned inter-atomic potentials. Here we discuss how one can combine such local descriptors using a Regularized Entropy Match (REMatch) approach to describe the similarity of both whole molecular and bulk periodic structures, introducing powerful metrics that allow the navigation of alchemical and structural complexity within a unified framework. Furthermore, using this kernel and a ridge regression method we can also predict atomization energies for a database of small organic molecules with a mean absolute error below 1kcal/mol, reaching an important milestone in the application of machine-learning techniques to the evaluation of molecular properties.
Statistical learning based on a local representation of atomic structures provides a universal model of chemical stability.
Nuclear quantum effects influence the structure and dynamics of hydrogen bonded systems, such as water, which impacts their observed properties with widely varying magnitudes. This review highlights the recent significant developments in the experiment, theory and simulation of nuclear quantum effects in water. Novel experimental techniques, such as deep inelastic neutron scattering, now provide a detailed view of the role of nuclear quantum effects in water's 2 properties. These have been combined with theoretical developments such as the introduction of the competing quantum effects principle that allows the subtle interplay of water's quantum effects and their manifestation in experimental observables to be explained. We discuss how this principle has recently been used to explain the apparent dichotomy in water's isotope effects, which can range from very large to almost nonexistent depending on the property and conditions. We then review the latest major developments in simulation algorithms and theory that have enabled the efficient inclusion of nuclear quantum effects in molecular simulations, permitting their combination with on-the-fly evaluation of the potential energy surface using electronic structure theory. Finally, we identify current challenges and future opportunities in the area.3
We provide an introduction to Gaussian process regression (GPR) machine-learning methods in computational materials science and chemistry. The focus of the present review is on the regression of atomistic properties: in particular, on the construction of interatomic potentials, or force fields, in the Gaussian Approximation Potential (GAP) framework; beyond this, we also discuss the fitting of arbitrary scalar, vectorial, and tensorial quantities. Methodological aspects of reference data generation, representation, and regression, as well as the question of how a data-driven model may be validated, are reviewed and critically discussed. A survey of applications to a variety of research questions in chemistry and materials science illustrates the rapid growth in the field. A vision is outlined for the development of the methodology in the years to come.
A new scheme, sketch-map, for obtaining a low-dimensional representation of the region of phase space explored during an enhanced dynamics simulation is proposed. We show evidence, from an examination of the distribution of pairwise distances between frames, that some features of the free-energy surface are inherently high-dimensional. This makes dimensionality reduction problematic because the data does not satisfy the assumptions made in conventional manifold learning algorithms We therefore propose that when dimensionality reduction is performed on trajectory data one should think of the resultant embedding as a quickly sketched set of directions rather than a road map. In other words, the embedding tells one about the connectivity between states but does not provide the vectors that correspond to the slow degrees of freedom. This realization informs the development of sketch-map, which endeavors to reproduce the proximity information from the high-dimensionality description in a space of lower dimensionality even when a faithful embedding is not possible.nonlinear dimensionality reduction | proteins | molecular dynamics T he dynamics of many of the molecules that appear in biology, materials science, and chemistry are highly complex. These molecules can undergo transitions involving large numbers of atoms between an enormous number of different configurations (1), which makes it difficult to comprehend these motions using only chemical intuition. Nevertheless, within this data there is a lot of correlation, and there is a strong body of evidence that the energetically accessible regions of phase space lie on a structure that has a low dimensionality (2-6). Therefore, low-dimensionality representations of the free-energy surface can give meaningful insight into phenomena and can provide collective variables (CVs) that can be used to accelerate the dynamics and to reconstruct the free-energy landscape. Methods exist for extracting this low-dimensionality structure by postprocessing the results of long unbiased molecular dynamics trajectories in which the entirety of the landscape is explored (3, 6-8). Unfortunately however, for many systems-in particular for atomistic simulations-obtaining information on interesting, long-time-scale motions using unbiased simulations requires heroic amounts of computational time (9). Therefore, for these types of problems one would ideally like to use dimensionality reduction in tandem with accelerated sampling. This has to work both ways-the method must be able to analyze data from accelerated sampling simulations on very rough free-energy surfaces. Furthermore, it should produce a mapping of phase space that can serve as an optimized, bespoke set of CVs for calculations that extract quantitative free energies.Experiments have shown that the low-free-energy part of phase space has a complex structure with a nonuniform dimensionality (8), that it is nonlinear (2, 4), that it is nonuniformly sampled (8, 10), and that it is possibly fractal (4, 11). It therefore seems likely tha...
Recently, we have shown how a colored-noise Langevin equation can be used in the context of molecular dynamics as a tool to obtain dynamical trajectories whose properties are tailored to display desired sampling features. In the present paper, after having reviewed some analytical results for the stochastic differential equations forming the basis of our approach, we describe in detail the implementation of the generalized Langevin equation thermostat and the fitting procedure used to obtain optimal parameters. We discuss in detail the simulation of nuclear quantum effects, and demonstrate that, by carefully choosing parameters, one can successfully model strongly anharmonic solids such as neon. For the reader's convenience, a library of thermostat parameters and some demonstrative code can be downloaded from an on-line repository.
Statistical learning methods show great promise in providing an accurate prediction of materials and molecular properties, while minimizing the need for computationally demanding electronic structure calculations. The accuracy and transferability of these models are increased significantly by encoding into the learning procedure the fundamental symmetries of rotational and permutational invariance of scalar properties. However, the prediction of tensorial properties requires that the model respects the appropriate geometric transformations, rather than invariance, when the reference frame is rotated. We introduce a formalism that extends existing schemes and makes it possible to perform machine learning of tensorial properties of arbitrary rank, and for general molecular geometries. To demonstrate it, we derive a tensor kernel adapted to rotational symmetry, which is the natural generalization of the smooth overlap of atomic positions kernel commonly used for the prediction of scalar properties at the atomic scale. The performance and generality of the approach is demonstrated by learning the instantaneous response to an external electric field of water oligomers of increasing complexity, from the isolated molecule to the condensed phase.
The path integral molecular dynamics ͑PIMD͒ method provides a convenient way to compute the quantum mechanical structural and thermodynamic properties of condensed phase systems at the expense of introducing an additional set of high frequency normal modes on top of the physical vibrations of the system. Efficiently sampling such a wide range of frequencies provides a considerable thermostatting challenge. Here we introduce a simple stochastic path integral Langevin equation ͑PILE͒ thermostat which exploits an analytic knowledge of the free path integral normal mode frequencies. We also apply a recently developed colored noise thermostat based on a generalized Langevin equation ͑GLE͒, which automatically achieves a similar, frequency-optimized sampling. The sampling efficiencies of these thermostats are compared with that of the more conventional Nosé-Hoover chain ͑NHC͒ thermostat for a number of physically relevant properties of the liquid water and hydrogen-in-palladium systems. In nearly every case, the new PILE thermostat is found to perform just as well as the NHC thermostat while allowing for a computationally more efficient implementation. The GLE thermostat also proves to be very robust delivering a near-optimum sampling efficiency in all of the cases considered. We suspect that these simple stochastic thermostats will therefore find useful application in many future PIMD simulations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.