2023
DOI: 10.1039/d2dd00137c
|View full text |Cite
|
Sign up to set email alerts
|

Synthetic data enable experiments in atomistic machine learning

Abstract: Machine-learning models are increasingly used to predict properties of atoms in chemical systems. There have been major advances in developing descriptors and regression frameworks for this task, typically starting from...

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

2
12
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(14 citation statements)
references
References 91 publications
2
12
0
Order By: Relevance
“…Although this data set is extensive and achieves substantial coverage of the reaction space, the number of occurrences a particular catalyst is used varies widely which may impact the reliability of the generality metrics derived from this literature curated data set (see above discussion). Consequently, and to explore different substrate–catalyst combinations more comprehensively, we investigated several robust non-linear machine learning (ML) regression techniques for correlating the enantioselectivity outcomes represented as ΔΔ G ‡ to the structure of the reaction components. , The resulting models could then be deployed to create a virtual data set by predicting the enantioselectivity for every combination of imine, nucleophile, and catalyst contained in the experimental database.…”
Section: Resultsmentioning
confidence: 99%
“…Although this data set is extensive and achieves substantial coverage of the reaction space, the number of occurrences a particular catalyst is used varies widely which may impact the reliability of the generality metrics derived from this literature curated data set (see above discussion). Consequently, and to explore different substrate–catalyst combinations more comprehensively, we investigated several robust non-linear machine learning (ML) regression techniques for correlating the enantioselectivity outcomes represented as ΔΔ G ‡ to the structure of the reaction components. , The resulting models could then be deployed to create a virtual data set by predicting the enantioselectivity for every combination of imine, nucleophile, and catalyst contained in the experimental database.…”
Section: Resultsmentioning
confidence: 99%
“…In conclusion, our work has shown that local-environment energies in ZIFs can be “machine-learned” using cg structural representations, with less than a factor of 2 loss of accuracy compared to established, fully atomistic approaches. In doing so, we showed that local energies from an empirical force field for ZIFs 14 can be readily available “synthetic” regression targets – extending prior work in the field of atomistic ML 15,16 to the construction of cg models. Chemically, our results provide direct and quantitative support for the long-standing idea that there exists a mapping between ZIFs and zeolites (Fig.…”
mentioning
confidence: 76%
“…The hypothetical ZIF dataset, including technical details of how it was constructed, is available at . In addition to per-cell (total) energies, this model by construction yields per-atom (local) energies – allowing us to build a “synthetic” dataset with which the properties of ML models can be studied, following ideas in ref. 15 and 16.…”
mentioning
confidence: 99%
“…More recently, researchers have been actively exploiting these locally predicted values to interpret the local stability of chemical environments in complex phases, 46−50 guide structural optimization, 51 and even use them as synthetic data for the pretraining of large neural network (NN) models. 52 While the practical benefits of local decomposition for atomistic ML are clear, one must be mindful of how reliable, or "robust", the resulting local predictions are. Since only the global quantity is rigorously defined, its decomposition into local contributions can take place in numerous different ways.…”
Section: Introductionmentioning
confidence: 99%
“…In the case of ML models of the electronic density of states, a plausible correlation could be found between different local structural motifs and how they “contribute” to the total density of states. More recently, researchers have been actively exploiting these locally predicted values to interpret the local stability of chemical environments in complex phases, guide structural optimization, and even use them as synthetic data for the pretraining of large neural network (NN) models …”
Section: Introductionmentioning
confidence: 99%