Tanjin He scite author profile

Materials discovery has become significantly facilitated and accelerated by high-throughput ab-initio computations. This ability to rapidly design interesting novel compounds has displaced the materials innovation bottleneck to the development of synthesis routes for the desired material. As there is no a fundamental theory for materials synthesis, one might attempt a data-driven approach for predicting inorganic materials synthesis, but this is impeded by the lack of a comprehensive database containing synthesis processes. To overcome this limitation, we have generated a dataset of “codified recipes” for solid-state synthesis automatically extracted from scientific publications. The dataset consists of 19,488 synthesis entries retrieved from 53,538 solid-state synthesis paragraphs by using text mining and natural language processing approaches. Every entry contains information about target material, starting compounds, operations used and their conditions, as well as the balanced chemical equation of the synthesis reaction. The dataset is publicly available and can be used for data mining of various aspects of inorganic materials synthesis.

show abstract

A chemical kinetic mechanism for the low- and intermediate-temperature combustion of Polyoxymethylene Dimethyl Ether 3 (PODE3)

Zhi

You

et al. 2018

Fuel

101

104

View full text Add to dashboard Cite

Semi-supervised machine-learning classification of materials synthesis procedures

et al. 2019

View full text Add to dashboard Cite

Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis. However, most content in the scientific literature is locked-up in written natural language, which is difficult to parse into databases using explicitly hard-coded classification rules. In this work, we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language. Without any human input, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps, such as "grinding" and "heating", "dissolving" and "centrifuging", etc. Guided by a modest amount of annotation, a random forest classifier can then associate these steps with different categories of materials synthesis, such as solid-state or hydrothermal synthesis. Finally, we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures. Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized, machine-readable database.

show abstract

Opportunities and challenges of text mining in materials research

Kononova

Huo

et al. 2021

iScience

View full text Add to dashboard Cite

Research publications are the major repository of scientific knowledge. However, their unstructured and highly heterogenous format creates a significant obstacle to large-scale analysis of the information contained within. Recent progress in natural language processing (NLP) has provided a variety of tools for high-quality information extraction from unstructured text. These tools are primarily trained on non-technical text and struggle to produce accurate results when applied to scientific text, involving specific technical terminology. During the last years, significant efforts in information retrieval have been made for biomedical and biochemical publications. For materials science, text mining (TM) methodology is still at the dawn of its development. In this review, we survey the recent progress in creating and applying TM and NLP approaches to materials science field. This review is directed at the broad class of researchers aiming to learn the fundamentals of TM as applied to the materials science publications.

show abstract

Similarity of Precursors in Solid-State Synthesis as Text-Mined from Scientific Literature

Sun

Huo

et al. 2020

Chem. Mater.

View full text Add to dashboard Cite

Collecting and analyzing the vast amount of information available in the solid-state chemistry literature may accelerate our understanding of materials synthesis. However, one major problem is the difficulty of identifying which materials from a synthesis paragraph are precursors or are target materials. In this study, we developed a two-step Chemical Named Entity Recognition (CNER) model to identify precursors and targets, based on information from the context around material entities. Using the extracted data, we conducted a meta-analysis to study the similarities and differences between precursors in the context of solid-state synthesis. To quantify precursor similarity, we built a substitution model to calculate the viability of substituting one precursor with another while retaining the target. From a hierarchical clustering of the precursors, we demonstrate that "chemical similarity" of precursors can be extracted from text data. Quantifying the similarity of precursors helps provide a foundation for suggesting candidate reactants in a predictive synthesis model.

show abstract

Synthetic accessibility and stability rules of NASICONs

Ouyang

Wang

et al. 2021

Nat Commun

View full text Add to dashboard Cite

In this paper we develop the stability rules for NASICON-structured materials, as an example of compounds with complex bond topology and composition. By first-principles high-throughput computation of 3881 potential NASICON phases, we have developed guiding stability rules of NASICON and validated the ab initio predictive capability through the synthesis of six attempted materials, five of which were successful. A simple two-dimensional descriptor for predicting NASICON stability was extracted with sure independence screening and machine learned ranking, which classifies NASICON phases in terms of their synthetic accessibility. This machine-learned tolerance factor is based on the Na content, elemental radii and electronegativities, and the Madelung energy and can offer reasonable accuracy for separating stable and unstable NASICONs. This work will not only provide tools to understand the synthetic accessibility of NASICON-type materials, but also demonstrates an efficient paradigm for discovering new materials with complicated composition and atomic structure.

show abstract

Recent progress in the application in compression ignition engines and the synthesis technologies of polyoxymethylene dimethyl ethers

Liu

Wang

et al. 2019

Applied Energy

View full text Add to dashboard Cite

Machine-Learning Rationalization and Prediction of Solid-State Synthesis Conditions

Huo

Bartel

et al. 2022

Chem. Mater.

View full text Add to dashboard Cite

There currently exist no quantitative methods to determine the appropriate conditions for solid-state synthesis. This not only hinders the experimental realization of novel materials but also complicates the interpretation and understanding of solid-state reaction mechanisms. Here, we demonstrate a machine-learning approach that predicts synthesis conditions using large solid-state synthesis data sets text-mined from scientific journal articles. Using feature importance ranking analysis, we discovered that optimal heating temperatures have strong correlations with the stability of precursor materials quantified using melting points and formation energies (Δ G f , Δ H f ). In contrast, features derived from the thermodynamics of synthesis-related reactions did not directly correlate to the chosen heating temperatures. This correlation between optimal solid-state heating temperature and precursor stability extends Tamman’s rule from intermetallics to oxide systems, suggesting the importance of reaction kinetics in determining synthesis conditions. Heating times are shown to be strongly correlated with the chosen experimental procedures and instrument setups, which may be indicative of human bias in the data set. Using these predictive features, we constructed machine-learning models with good performance and general applicability to predict the conditions required to synthesize diverse chemical systems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tanjin He

Text-mined dataset of inorganic materials synthesis recipes

A chemical kinetic mechanism for the low- and intermediate-temperature combustion of Polyoxymethylene Dimethyl Ether 3 (PODE3)

Semi-supervised machine-learning classification of materials synthesis procedures

Opportunities and challenges of text mining in materials research

Similarity of Precursors in Solid-State Synthesis as Text-Mined from Scientific Literature

Synthetic accessibility and stability rules of NASICONs

Recent progress in the application in compression ignition engines and the synthesis technologies of polyoxymethylene dimethyl ethers

Machine-Learning Rationalization and Prediction of Solid-State Synthesis Conditions

Contact Info

Product

Resources

About