Zeolites
are porous, aluminosilicate materials with many industrial
and “green” applications. Despite their industrial relevance,
many aspects of zeolite synthesis remain poorly understood requiring
costly trial and error synthesis. In this paper, we create natural
language processing techniques and text markup parsing tools to automatically
extract synthesis information and trends from zeolite journal articles.
We further engineer a data set of germanium-containing zeolites to
test the accuracy of the extracted data and to discover potential
opportunities for zeolites containing germanium. We also create a
regression model for a zeolite’s framework density from the
synthesis conditions. This model has a cross-validated root mean squared
error of 0.98 T/1000 Å
3
, and many of the model decision
boundaries correspond to known synthesis heuristics in germanium-containing
zeolites. We propose that this automatic data extraction can be applied
to many different problems in zeolite synthesis and enable novel zeolite
morphologies.
Zeolites are versatile catalysts and molecular sieves with large topological diversity, but managing phase competition in zeolite synthesis is an empirical, labor-intensive task. Here, we controlled phase selectivity in templated zeolite synthesis from first principles by combining high-throughput atomistic simulations, literature mining, human-computer interaction, synthesis, and characterization. Proposed binding metrics distilled from over 586,000 zeolite-molecule simulations reproduced the extracted literature and rationalize framework competition in the design of organic structure-directing agents. Energetic, geometric, and electrostatic descriptors of template molecules were found to regulate synthetic accessibility windows and aluminum distributions in pure-phase zeolites. Furthermore, these parameters allowed realizing an intergrowth zeolite through a single bi-selective template. The computation-first approach enabled controlling both zeolite synthesis and structure composition using a priori theoretical descriptors.
Leveraging new data sources is a key step in accelerating the pace of materials design and discovery. To complement the strides in synthesis planning driven by historical, experimental, and computed data, we present an automated method for connecting scientific literature to synthesis insights. Starting from natural language text, we apply word embeddings from language models, which are fed into a named entity recognition model, upon which a conditional variational autoencoder is trained to generate syntheses for arbitrary materials. We show the potential of this technique by predicting precursors for two perovskite materials, using only training data published over a decade prior to their first reported syntheses. We demonstrate that the model learns representations of materials corresponding to synthesis-related properties, and that the model's behavior complements existing thermodynamic knowledge. Finally, we apply the model to perform synthesizability screening for proposed novel perovskite compounds.
Organic structure
directing agents (OSDAs) play a crucial role
in the synthesis of micro- and mesoporous materials especially in
the case of zeolites. Despite the wide use of OSDAs, their interaction
with zeolite frameworks is poorly understood, with researchers relying
on synthesis heuristics or computationally expensive techniques to
predict whether an organic molecule can act as an OSDA for a certain
zeolite. In this paper, we undertake a data-driven approach to unearth
generalized OSDA–zeolite relationships using a comprehensive
database comprising of 5,663 synthesis routes for porous materials.
To generate this comprehensive database, we use natural language processing
and text mining techniques to extract OSDAs, zeolite phases, and gel
chemistry from the scientific literature published between 1966 and
2020. Through structural featurization of the OSDAs using weighted
holistic invariant molecular (WHIM) descriptors, we relate OSDAs described
in the literature to different types of cage-based, small-pore zeolites.
Lastly, we adapt a generative neural network capable of suggesting
new molecules as potential OSDAs for a given zeolite structure and
gel chemistry. We apply this model to CHA and SFW zeolites generating
several alternative OSDA candidates to those currently used in practice.
These molecules are further vetted with molecular mechanics simulations
to show the model generates physically meaningful predictions. Our
model can automatically explore the OSDA space, reducing the amount
of simulation or experimentation needed to find new OSDA candidates.
Data-driven synthesis planning with machine learning is a key step in the design and discovery of novel inorganic compounds with desirable properties. Inorganic materials synthesis is often guided by heuristics and chemists' prior knowledge and experience, built upon experimental trial-anderror that can be both time and resource consuming. Recent developments in natural language processing have enabled largescale text mining of scientific literature, providing open-source databases of synthesis information on realized compounds, material precursors, and reaction conditions (temperatures, times). We employ supervised classification machine learning (ML) models to distinguish between solid-state, sol−gel, and solution (hydrothermal, precipitation) synthesis routes based on specified reaction target material and/or precursor materials. We demonstrate regression ML models that are able to predict suitable temperatures and times for the crucial inorganic synthesis steps of calcination and sintering given the reaction target and precursor materials. We contrast this regression-based condition modeling with a conditional variational autoencoder neural network that can generate appropriate distributions for the synthesis conditions of interest. We evaluate model interpretability using the Shapley additive explanations approach to gain insight into factors influencing suitability of synthesis route and reaction conditions. We find that the aforementioned models are capable of learning subtle differences in target material composition, precursor compound identities, and choice of synthesis route that are present in the inorganic synthesis space. Moreover, they generalize well to unseen chemical entities, outperform common heuristics in the field, and show promise for predicting appropriate reaction routes and conditions for previously unsynthesized compounds of interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.