Ambient air pollutants are associated with newly diagnosed tuberculosis: A time-series study in Chengdu, China

Mapping an atomistic configuration to a symmetrized N-point correlation of a field associated with the atomic positions (e.g., an atomic density) has emerged as an elegant and effective solution to represent structures as the input of machine-learning algorithms. While it has become clear that low-order density correlations do not provide a complete representation of an atomic environment, the exponential increase in the number of possible N-body invariants makes it difficult to design a concise and effective representation. We discuss how to exploit recursion relations between equivariant features of different order (generalizations of N-body invariants that provide a complete representation of the symmetries of improper rotations) to compute high-order terms efficiently. In combination with the automatic selection of the most expressive combination of features at each order, this approach provides a conceptual and practical framework to generate systematically improvable, symmetry adapted representations for atomistic machine learning.

show abstract

Optimal radial basis for density-based atomic representations

Goscinski

Musil

Pozdnyakov

et al. 2021

View full text Add to dashboard Cite

The input of almost every machine learning algorithm targeting the properties of matter at the atomic scale involves a transformation of the list of Cartesian atomic coordinates into a more symmetric representation. Many of the most popular representations can be seen as an expansion of the symmetrized correlations of the atom density and differ mainly by the choice of basis. Considerable effort has been dedicated to the optimization of the basis set, typically driven by heuristic considerations on the behavior of the regression target. Here, we take a different, unsupervised viewpoint, aiming to determine the basis that encodes in the most compact way possible the structural information that is relevant for the dataset at hand. For each training dataset and number of basis functions, one can build a unique basis that is optimal in this sense and can be computed at no additional cost with respect to the primitive basis by approximating it with splines. We demonstrate that this construction yields representations that are accurate and computationally efficient, particularly when working with representations that correspond to high-body order correlations. We present examples that involve both molecular and condensed-phase machine-learning models.

show abstract

Unified theory of atom-centered representations and message-passing machine-learning schemes

Nigam

Pozdnyakov

Fraux

et al. 2022

View full text Add to dashboard Cite

Data-driven schemes that associate molecular and crystal structures with their microscopic properties share the need for a concise, effective description of the arrangement of their atomic constituents. Many types of models rely on descriptions of atom-centered environments, that are associated with an atomic property or with an atomic contribution to an extensive macroscopic quantity. Frameworks in this class can be understood in terms of atom-centered density correlations (ACDC), that are used as a basis for a body-ordered, symmetry-adapted expansion of the targets.Several other schemes, that gather information on the relationship between neighboring atoms using ``message-passing' ideas, cannot be directly mapped to correlations centered around a single atom. We generalize the ACDC framework to include multi-centered information, generating representations that provide a complete linear basis to regress symmetric functions of atomic coordinates, and provides a coherent foundation to systematize our understanding of both atom-centered and message-passing, invariant and equivariant machine-learning schemes.

show abstract

Local invertibility and sensitivity of atomic structure-feature mappings

Pozdnyakov¹,

Zhang²,

Ortner³

et al. 2021

Open Res Europe

View full text Add to dashboard Cite

Background: The increasingly common applications of machine-learning schemes to atomic-scale simulations have triggered efforts to better understand the mathematical properties of the mapping between the Cartesian coordinates of the atoms and the variety of representations that can be used to convert them into a finite set of symmetric descriptors or features. Methods: Here, we analyze the sensitivity of the mapping to atomic displacements, using a singular value decomposition of the Jacobian of the transformation to quantify the sensitivity for different configurations, choice of representations and implementation details. Results: We show that the combination of symmetry and smoothness leads to mappings that have singular points at which the Jacobian has one or more null singular values (besides those corresponding to infinitesimal translations and rotations). This is in fact desirable, because it enforces physical symmetry constraints on the values predicted by regression models constructed using such representations. However, besides these symmetry-induced singularities, there are also spurious singular points, that we find to be linked to the incompleteness of the mapping, i.e. the fact that, for certain classes of representations, structurally distinct configurations are not guaranteed to be mapped onto different feature vectors. Additional singularities can be introduced by a too aggressive truncation of the infinite basis set that is used to discretize the representations. Conclusions: These results exemplify the subtle issues that arise when constructing symmetric representations of atomic structures, and provide conceptual and numerical tools to identify and investigate them in both benchmark and realistic applications.

show abstract

Comment on “Manifolds of quasi-constant SOAP and ACSF fingerprints and the resulting failure to machine learn four-body interactions” [J. Chem. Phys. 156, 034302 (2022)]

Pozdnyakov

Willatt

Bartók-Pártay

et al. 2022

View full text Add to dashboard Cite

The "quasi-constant' SOAP and ACSF fingerprint manifolds recently discovered by Parsaeifard and Goedecker[J. Chem. Phys. 156, 034302 (2022)] are closely related to the degenerate pairs of configurations, that are a known shortcoming of all low-body-order atom-density correlation representations of molecular structures. Configurations that are rigorously singular -- that we demonstrate can only occur in finite, discrete sets, and not as a continuous manifolds -- determine the complete failure of machine-learning models built on this class of descriptors. The ``quasi-constant' manifolds, on the other hand, exhibit low but non-zero sensitivity to atomic displacements. As a consequence, for any such manifold, it is possible to optimize the model parameters and the training set to mitigate their impact on learning, even though this is often impractical and it is preferable to use descriptors that avoid both the exact singularities and the associated numerical instability.

show abstract

Incompleteness of graph neural networks for points clouds in three dimensions

Pozdnyakov

Ceriotti²

2022

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

Graph neural networks (GNN) are very popular methods in machine learning and have been applied very successfully to the prediction of the properties of molecules and materials. First-order GNNs are well known to be incomplete, i.e., there exist graphs that are distinct but appear identical when seen through the lens of the GNN. More complicated schemes have thus been designed to increase their resolving power. Applications to molecules (and more generally, point clouds), however, add a geometric dimension to the problem. The most straightforward and prevalent approach to construct graph representation for molecules regards atoms as vertices in a graph and draws a bond between each pair of atoms within a chosen cutoff. Bonds can be decorated with the distance between atoms, and the resulting ``distance graph NNs'' (dGNN) have empirically demonstrated excellent resolving power and are widely used in chemical ML, {with all known indistinguishable configurations being resolved in the fully-connected limit, which is equivalent to infinite or sufficiently large cutoff.} Here we {present a counterexample that proves that dGNNs are not complete even for the restricted case of fully-connected graphs induced by 3D atom clouds.} We construct pairs of distinct point clouds whose associated graphs are, for any cutoff radius, equivalent based on a first-order Weisfeiler-Lehman test. This class of degenerate structures includes chemically-plausible configurations, {both for isolated structures and for infinite structures that are periodic in 1, 2, and 3 dimensions.} The existence of indistinguishable configurations sets an ultimate limit to the expressive power of some of the well-established GNN architectures for atomistic machine learning. Models that explicitly use angular or directional information in the description of atomic environments can resolve this class of degeneracies.

show abstract

Incompleteness of graph neural networks for points clouds in three dimensions

Pozdnyakov¹,

Ceriotti²

2022

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sergey Pozdnyakov

Incompleteness of Atomic Structure Representations

Recursive evaluation and iterative contraction of N-body equivariant features

Optimal radial basis for density-based atomic representations

Unified theory of atom-centered representations and message-passing machine-learning schemes

Local invertibility and sensitivity of atomic structure-feature mappings

Comment on “Manifolds of quasi-constant SOAP and ACSF fingerprints and the resulting failure to machine learn four-body interactions” [J. Chem. Phys. 156, 034302 (2022)]

Incompleteness of graph neural networks for points clouds in three dimensions

Incompleteness of graph neural networks for points clouds in three dimensions

Contact Info

Product

Resources

About