Evaluating the (dis)similarity of crystalline, disordered and molecular compounds is a critical step in the development of algorithms to navigate automatically the configuration space of complex materials. For instance, a structural similarity metric is crucial for classifying structures, searching chemical space for better compounds and materials, and to drive the next generation of machine-learning techniques for predicting the stability and properties of molecules and materials. In the last few years several strategies have been designed to compare atomic coordination environments. In particular, the Smooth Overlap of Atomic Positions (SOAP) has emerged as a natural framework to obtain translation, rotation and permutation-invariant descriptors of groups of atoms, driven by the design of various classes of machine-learned inter-atomic potentials. Here we discuss how one can combine such local descriptors using a Regularized Entropy Match (REMatch) approach to describe the similarity of both whole molecular and bulk periodic structures, introducing powerful metrics that allow the navigation of alchemical and structural complexity within a unified framework. Furthermore, using this kernel and a ridge regression method we can also predict atomization energies for a database of small organic molecules with a mean absolute error below 1kcal/mol, reaching an important milestone in the application of machine-learning techniques to the evaluation of molecular properties.
Statistical learning based on a local representation of atomic structures provides a universal model of chemical stability.
Due to their strong dependence on local atonic environments, NMR chemical shifts are among the most powerful tools for strucutre elucidation of powdered solids or amorphous materials. Unfortunately, using them for structure determination depends on the ability to calculate them, which comes at the cost of high accuracy first-principles calculations. Machine learning has recently emerged as a way to overcome the need for quantum chemical calculations, but for chemical shifts in solids it is hindered by the chemical and combinatorial space spanned by molecular solids, the strong dependency of chemical shifts on their environment, and the lack of an experimental database of shifts. We propose a machine learning method based on local environments to accurately predict chemical shifts of molecular solids and their polymorphs to within DFT accuracy. We also demonstrate that the trained model is able to determine, based on the match between experimentally measured and ML-predicted shifts, the structures of cocaine and the drug 4-[4-(2-adamantylcarbamoyl)-5-tert-butylpyrazol-1-yl]benzoic acid.
Polymorphism is common in molecular crystals, whose energy landscapes usually contain many structures with similar stability, but very different physical–chemical properties. Machine-learning techniques can accelerate the evaluation of energy and properties by side-stepping accurate but demanding electronic-structure calculations, and provide a data-driven classification of the most important molecular packing motifs.
Using the minima hopping global geometry optimization method on the density functional potential energy surface we show that the energy landscape of boron clusters is glass like. Larger boron clusters have many structures which are lower in energy than the cages. This is in contrast to carbon and boron nitride systems which can be clearly identified as structure seekers. The differences in the potential energy landscape explain why carbon and boron nitride systems are found in nature whereas pure boron fullerenes have not been found. We thus present a methodology which can make predictions on the feasibility of the synthesis of new nano structures. The experimental synthesis of fullerenes is a very difficult task. The carbon fullerene structures were therefore theoretically predicted [1] long before they could be produced in the lab [2]. Many more hollow and enhodedrally doped fullerene structures made out of elements different from carbon have also been proposed since then theoretically [3] in searches of other possible building blocks for nano-sciences. It is however surprising that since the experimental discovery of the carbon fullerenes some 25 years ago no other fullerenes have been synthesized. So the question is whether experimentalists have just not yet found a way to synthesize these theoretically predicted fullerenes, or whether they do not exist at all in nature. We have recently shown [4] that all the theoretically proposed endohedral Si 20 fullerenes are meta-stable and can thus most likely not be found in nature. In this letter we investigate in detail boron clusters. Following the B 80 fullerene structure proposed by Szwacki et al. [5] various other fullerene [6] and stuffed fullerene structures [7] were proposed. Subsequently it was however shown for B 80 that there exist non-fullerene structures [8] which are lower in energy. We will contrast the characteristics of the potential energy landscape (PES) of these boron clusters with those of systems found in nature, namely carbon and boron nitride fullerenes and find that there are important differences.To explore the energy landscape of the boron, carbon and boron nitride clusters we do global geometry optimizations on the density functional potential energy surface with the minima hopping algorithm [9]. This algorithm can render the global minimum configuration as well as many other low energy meta-stable structures. All the density functional calculations are done with the BigDFT electronic structure code [10] which uses a systematic wavelet basis together with pseudopotentials [11] and the standard LDA [11] and PBE [12] exchange correlation functionals.We start out by analyzing the B 16 N 16 cluster which was found to be short lived in experiments [13]. In this system structural rigidity is imposed by a strong preference for sp2 hybridization [14] as well as by the requirement that bonds are only formed between atoms of different type. This leads to a small configurational density of states. As shown in Fig. 1 there exists a fairly large energ...
Predictive computational methods have the potential to significantly accelerate the discovery of new materials with targeted properties by guiding the choice of candidate materials for synthesis. Recently, a planar pyrrole-based azaphenacene molecule (pyrido[2,3b]pyrido [3 ,2 :4,5]pyrrolo[3,2-g]indole, 1) was synthesized and shown to have promising properties for charge transport, which relate to stacking of molecules in its crystal structure. Building on our methods for evaluating small molecule organic semiconductors using crystal structure prediction, we have screened a set of 27 structural isomers of 1 to assess charge mobility in their predicted crystal structures. Machine-learning techniques are used to identify structural classes across the landscapes of all molecules and we find that, despite differences in the arrangement of hydrogen bond functionality, the predicted crystal structures of the molecules studied here can be classified into a small number of packing types. We analyze the predicted property landscapes of the series of molecules and discuss several metrics that can be used to rank the molecules as promising semiconductors. The results suggest several isomers with superior predicted electron mobilities to 1 and suggest two molecules in particular that represent attractive synthetic targets. Supporting Information AvailableDetails of the crystal structure classification scheme, information on convergence of the crystal structure search and number of unique crystal structures per molecule, eigenvalue spectrum of the SOAP Similarity Kernel, details of the electron mobility calculations, energy-structure-function maps of all molecules, discussion of uncertainties in the electron mobility calculations.
Atomic environment fingerprints are widely used in computational materials science, from machine learning potentials to the quantification of similarities between atomic configurations. Many approaches to the construction of such fingerprints, also called structural descriptors, have been proposed. In this work, we compare the performance of fingerprints based on the overlap matrix, the smooth overlap of atomic positions, Behler–Parrinello atom-centered symmetry functions, modified Behler–Parrinello symmetry functions used in the ANI-1ccx potential and the Faber–Christensen–Huang–Lilienfeld fingerprint under various aspects. We study their ability to resolve differences in local environments and in particular examine whether there are certain atomic movements that leave the fingerprints exactly or nearly invariant. For this purpose, we introduce a sensitivity matrix whose eigenvalues quantify the effect of atomic displacement modes on the fingerprint. Further, we check whether these displacements correlate with the variation of localized physical quantities such as forces. Finally, we extend our examination to the correlation between molecular fingerprints obtained from the atomic fingerprints and global quantities of entire molecules.
Molecular-level understanding and characterization of solvation environments are often needed across chemistry, biology, and engineering. Toward practical modeling of local solvation effects of any solute in any solvent, we report a static and all-quantum mechanics-based cluster-continuum approach for calculating single-ion solvation free energies. This approach uses a global optimization procedure to identify low-energy molecular clusters with different numbers of explicit solvent molecules and then employs the smooth overlap for atomic positions learning kernel to quantify the similarity between different low-energy solute environments. From these data, we use sketch maps, a nonlinear dimensionality reduction algorithm, to obtain a two-dimensional visual representation of the similarity between solute environments in differently sized microsolvated clusters. After testing this approach on different ions having charges 2+, 1+, 1−, and 2−, we find that the solvation environment around each ion can be seen to usually become more similar in hand with its calculated single-ion solvation free energy. Without needing either dynamics simulations or an a priori knowledge of local solvation structure of the ions, this approach can be used to calculate solvation free energies within 5% of experimental measurements for most cases, and it should be transferable for the study of other systems where dynamics simulations are not easily carried out.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.