Millions of distinct metal-organic frameworks (MOFs) can be made by combining metal nodes and organic linkers. At present, over 90,000 MOFs have been synthesized and over 500,000 predicted. This raises the question whether a new experimental or predicted structure adds new information. For MOF chemists, the chemical design space is a combination of pore geometry, metal nodes, organic linkers, and functional groups, but at present we do not have a formalism to quantify optimal coverage of chemical design space. In this work, we develop a machine learning method to quantify similarities of MOFs to analyse their chemical diversity. This diversity analysis identifies biases in the databases, and we show that such bias can lead to incorrect conclusions. The developed formalism in this study provides a simple and practical guideline to see whether new structures will have the potential for new insights, or constitute a relatively small variation of existing structures.
Machine learning the electronic structure of open shell transition metal complexes presents unique challenges, including robust and automated data set generation. Here, we introduce tools that simplify data acquisition from density functional theory (DFT) and validation of trained machine learning models using the molSimplify automatic design (mAD) workflow. We demonstrate this workflow by training and comparing the performance of LASSO, kernel ridge regression (KRR), and artificial neural network (ANN) models using heuristic, topological revised autocorrelation (RAC) descriptors we have recently introduced for machine learning inorganic chemistry. On a series of open shell transition metal complexes, we evaluate set aside test errors of these models for predicting the HOMO level and HOMO-LUMO gap. The best performing models are ANNs, which show 0.15 and 0.25 eV test set mean absolute errors on the HOMO level and HOMO-LUMO gap, respectively. Poor performing KRR models using the full 153-feature RAC set are improved to nearly the same performance as the ANNs when trained on down-selected subsets of 20-30 features. Analysis of the essential descriptors for HOMO and HOMO-LUMO gap prediction as well as comparison to subsets previously obtained for other properties reveals the paramount importance of non-local, steric properties in determining frontier molecular orbital energetics. We demonstrate our model performance on diverse complexes and in the discovery of molecules with target HOMO-LUMO gaps from a large 15,000 molecule design space in minutes rather than days that full DFT evaluation would require.
Recent transformative advances in computing power and algorithms have made computational chemistry central to the discovery and design of new molecules and materials. First-principles simulations are increasingly accurate and applicable to large systems with the speed needed for high-throughput computational screening. Despite these strides, the combinatorial challenges associated with the vastness of chemical space mean that more than just fast and accurate computational tools are needed for accelerated chemical discovery. In transition-metal chemistry and catalysis, unique challenges arise. The variable spin, oxidation state, and coordination environments favored by elements with well-localized d or f electrons provide great opportunity for tailoring properties in catalytic or functional (e.g., magnetic) materials but also add layers of uncertainty to any design strategy. We outline five key mandates for realizing computationally driven accelerated discovery in inorganic chemistry: (i) fully automated simulation of new compounds, (ii) knowledge of prediction sensitivity or accuracy, (iii) faster-than-fast property prediction methods, (iv) maps for rapid chemical space traversal, and (v) a means to reveal design rules on the kilocompound scale. Through case studies in open-shell transition-metal chemistry, we describe how advances in methodology and software in each of these areas bring about new chemical insights. We conclude with our outlook on the next steps in this process toward realizing fully autonomous discovery in inorganic chemistry using computational chemistry.
A predictive approach for driving down machine learning model errors is introduced and demonstrated across discovery for inorganic and organic chemistry.
High-throughput computational screening for chemical discovery mandates the automated and unsupervised simulation of thousands of new molecules and materials. In challenging materials spaces, such as open shell transition metal chemistry, characterization requires time-consuming first-principles simulation that often necessitates human intervention. These calculations can frequently lead to a null result, e.g., the calculation does not converge or the molecule does not stay intact during a geometry optimization. To overcome this challenge toward realizing fully automated chemical discovery in transition metal chemistry, we have developed the first machine learning models that predict the likelihood of successful simulation outcomes. We train support vector machine and artificial neural network classifiers to predict simulation outcomes (i.e., geometry optimization result and degree of deviation) for a chosen electronic structure method based on chemical composition. For these static models, we achieve an area under the curve of at least 0.95, minimizing computational time spent on non-productive simulations and therefore enabling efficient chemical space exploration. We introduce a metric of model uncertainty based on the distribution of points in the latent space to systematically improve model prediction confidence. In a complementary approach, we train a convolutional neural network classification model on simulation output electronic and geometric structure time series data. This dynamic model generalizes more readily than the static classifier by becoming more predictive as input simulation length increases. Finally, we describe approaches for using these models to enable autonomous job control in transition metal complex discovery. File list (3) download file view on ChemRxiv DuanClassifier.pdf (2.94 MiB) download file view on ChemRxiv SIClassifier_v5.pdf (5.28 MiB) download file view on ChemRxiv data_set.zip (35.83 MiB)
The design of selective and active C−H activation catalysts for direct methane-to-methanol conversion is challenging. Bioinspired complexes that form high-valent metal−oxo intermediates capable of hydrogen abstraction and rebound hydroxylation are promising candidates. This promise has made them a target for computational high-throughput screening, typically simplified through the use of linear free energy relationships (LFERs). However, their mid-row transition-metal centers have numerous accessible spin and oxidation states that increase the combinatorial scale of design efforts. Here, we carry out a computational design screen of over 2500 mid-row 3d transition-metal complexes with four metals in numerous spin and oxidation states. We demonstrate the importance of spin/oxidation state in dictating design principles, limiting the generalization of strategies derived for widely studied high-spin Fe(II) catalysts to other metals or spin/oxidation states. Combined assessment of the effect of ligand-field tuning on reaction step energetics and on the identity of the ground state allows us to propose refined design strategies for spin-allowed methane-to-methanol catalysis. We observe weak coupling of energetics and design principles between reaction steps (e.g., oxo formation vs methanol release), meaning that LFERs do not generalize across our larger catalyst set. To rationalize relative reactivity in known catalysts, we instead compute independent reaction energies and propose strategies for further improvements in catalyst design.
High-throughput computational screening typically employs methods (i.e., density functional theory or DFT) that can fail to describe challenging molecules, such as those with strongly correlated electronic structure. In such cases, multireference (MR) correlated wavefunction theory (WFT) would be the appropriate choice but remains more challenging to carry out and automate than single-reference (SR) WFT or DFT. Numerous diagnostics have been proposed for identifying when MR character is likely to have an effect on the predictive power of SR calculations, but conflicting conclusions about diagnostic performance have been reached on small data sets. We compute 15 MR diagnostics, ranging from affordable DFT-based to more costly MR-WFT-based diagnostics, on a set of 3,165 equilibrium and distorted small organic molecules containing up to six heavy atoms. Conflicting MR character assignments and low pairwise linear correlations among diagnostics are also observed over this set. We evaluate the ability of existing diagnostics to predict the percent recovery of the correlation energy, %E corr . None of the DFT-based diagnostics are nearly as predictive of %E corr as the best WFT-based diagnostics. To overcome the limitation of this cost-accuracy trade-off, we develop machine learning (ML, i.e., kernel ridge regression) models to predict WFT-based diagnostics from a combination of DFT-based diagnostics and a new, size-independent 3D geometric representation. The ML-predicted diagnostics correlate as well with MR effects as their computed (i.e., with WFT) values, significantly improving over the DFT-based diagnostics on which the models were trained.These ML models thus provide a promising approach to improve upon DFT-based diagnostic accuracy while remaining suitably low cost for high-throughput screening. File list (4) download file view on ChemRxiv MRML1_v14.pdf (3.25 MiB) download file view on ChemRxiv SI_MRML1_v6.pdf (8.52 MiB) download file view on ChemRxiv SI_MRML1_Data_04112020.zip (4.00 MiB)
Metal−oxo moieties are important catalytic intermediates in the selective partial oxidation of hydrocarbons and in water splitting. Stable metal−oxo species have reactive properties that vary depending on the spin state of the metal, complicating the development of structure−property relationships. To overcome these challenges, we train machine-learning (ML) models capable of predicting metal−oxo formation energies across a range of first-row metals, oxidation states, and spin states. Using connectivity-only features tailored for inorganic chemistry as inputs to kernel ridge regression or artificial neural network (ANN) ML models, we achieve good mean absolute errors (4−5 kcal/mol) on set-aside test data across a range of ligand orientations. Analysis of feature importance for oxo formation energy prediction reveals the dominance of nonlocal, electronic ligand properties in contrast to other transition metal complex properties (e.g., spin-state or ionization potential). We enumerate the theoretical catalyst space with an ANN, revealing expected trends in oxo formation energetics, such as destabilization of the metal−oxo species with increasing d-filling, as well as exceptions, such as weak correlations with indicators of oxidative stability of the metal in the resting state or unexpected spin-state dependence in reactivity. We carry out uncertainty-aware evolutionary optimization using the ANN to explore a >37 000 candidate catalyst space. New metal and oxidation state combinations are uncovered and validated with density functional theory (DFT), including counterintuitive oxo formation energies for oxidatively stable complexes. This approach doubles the density of confirmed DFT leads in originally sparsely populated regions of property space, highlighting the potential of ML-model-driven discovery to uncover catalyst design rules and exceptions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.