Haoyan Huo scite author profile

Materials discovery has become significantly facilitated and accelerated by high-throughput ab-initio computations. This ability to rapidly design interesting novel compounds has displaced the materials innovation bottleneck to the development of synthesis routes for the desired material. As there is no a fundamental theory for materials synthesis, one might attempt a data-driven approach for predicting inorganic materials synthesis, but this is impeded by the lack of a comprehensive database containing synthesis processes. To overcome this limitation, we have generated a dataset of “codified recipes” for solid-state synthesis automatically extracted from scientific publications. The dataset consists of 19,488 synthesis entries retrieved from 53,538 solid-state synthesis paragraphs by using text mining and natural language processing approaches. Every entry contains information about target material, starting compounds, operations used and their conditions, as well as the balanced chemical equation of the synthesis reaction. The dataset is publicly available and can be used for data mining of various aspects of inorganic materials synthesis.

show abstract

Unified Representation of Molecules and Crystals for Machine Learning

Huo¹,

Rupp²

2017

Preprint

132

View full text Add to dashboard Cite

Accurate simulations of atomistic systems from first principles are limited by computational cost. In high-throughput settings, machine learning can potentially reduce these costs significantly by accurately interpolating between reference calculations. For this, kernel learning approaches crucially require a single Hilbert space accommodating arbitrary atomistic systems. We introduce a many-body tensor representation that is invariant to translations, rotations and nuclear permutations of same elements, unique, differentiable, can represent molecules and crystals, and is fast to compute. Empirical evidence is presented for energy prediction errors below 1 kcal/mol for 7 k organic molecules and 5 meV/atom for 11 k elpasolite crystals. Applicability is demonstrated for phase diagrams of Pt-group/transition-metal binary systems.

show abstract

Semi-supervised machine-learning classification of materials synthesis procedures

et al. 2019

View full text Add to dashboard Cite

Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis. However, most content in the scientific literature is locked-up in written natural language, which is difficult to parse into databases using explicitly hard-coded classification rules. In this work, we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language. Without any human input, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps, such as "grinding" and "heating", "dissolving" and "centrifuging", etc. Guided by a modest amount of annotation, a random forest classifier can then associate these steps with different categories of materials synthesis, such as solid-state or hydrothermal synthesis. Finally, we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures. Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized, machine-readable database.

show abstract

Opportunities and challenges of text mining in materials research

Kononova

Huo

et al. 2021

iScience

View full text Add to dashboard Cite

Research publications are the major repository of scientific knowledge. However, their unstructured and highly heterogenous format creates a significant obstacle to large-scale analysis of the information contained within. Recent progress in natural language processing (NLP) has provided a variety of tools for high-quality information extraction from unstructured text. These tools are primarily trained on non-technical text and struggle to produce accurate results when applied to scientific text, involving specific technical terminology. During the last years, significant efforts in information retrieval have been made for biomedical and biochemical publications. For materials science, text mining (TM) methodology is still at the dawn of its development. In this review, we survey the recent progress in creating and applying TM and NLP approaches to materials science field. This review is directed at the broad class of researchers aiming to learn the fundamentals of TM as applied to the materials science publications.

show abstract

Toward autonomous design and synthesis of novel inorganic materials

et al. 2021

View full text Add to dashboard Cite

show abstract

Similarity of Precursors in Solid-State Synthesis as Text-Mined from Scientific Literature

Sun

Huo

et al. 2020

Chem. Mater.

View full text Add to dashboard Cite

Collecting and analyzing the vast amount of information available in the solid-state chemistry literature may accelerate our understanding of materials synthesis. However, one major problem is the difficulty of identifying which materials from a synthesis paragraph are precursors or are target materials. In this study, we developed a two-step Chemical Named Entity Recognition (CNER) model to identify precursors and targets, based on information from the context around material entities. Using the extracted data, we conducted a meta-analysis to study the similarities and differences between precursors in the context of solid-state synthesis. To quantify precursor similarity, we built a substitution model to calculate the viability of substituting one precursor with another while retaining the target. From a hierarchical clustering of the precursors, we demonstrate that "chemical similarity" of precursors can be extracted from text data. Quantifying the similarity of precursors helps provide a foundation for suggesting candidate reactants in a predictive synthesis model.

show abstract

Unified representation of molecules and crystals for machine learning

Huo

Rupp

2022

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

Accurate simulations of atomistic systems from first principles are limited by computational cost. In high-throughput settings, machine learning can reduce these costs significantly by accurately interpolating between reference calculations. For this, kernel learning approaches crucially require a representation that accommodates arbitrary atomistic systems. We introduce a many-body tensor representation that is invariant to translations, rotations, and nuclear permutations of same elements, unique, differentiable, can represent molecules and crystals, and is fast to compute. Empirical evidence for competitive energy and force prediction errors is presented for changes in molecular structure, crystal chemistry, and molecular dynamics using kernel regression and symmetric gradient-domain machine learning as models. Applicability is demonstrated for phase diagrams of Pt-group/transition-metal binary systems.

show abstract

Synthetic accessibility and stability rules of NASICONs

Ouyang

Wang

et al. 2021

Nat Commun

View full text Add to dashboard Cite

In this paper we develop the stability rules for NASICON-structured materials, as an example of compounds with complex bond topology and composition. By first-principles high-throughput computation of 3881 potential NASICON phases, we have developed guiding stability rules of NASICON and validated the ab initio predictive capability through the synthesis of six attempted materials, five of which were successful. A simple two-dimensional descriptor for predicting NASICON stability was extracted with sure independence screening and machine learned ranking, which classifies NASICON phases in terms of their synthetic accessibility. This machine-learned tolerance factor is based on the Na content, elemental radii and electronegativities, and the Madelung energy and can offer reasonable accuracy for separating stable and unstable NASICONs. This work will not only provide tools to understand the synthetic accessibility of NASICON-type materials, but also demonstrates an efficient paradigm for discovering new materials with complicated composition and atomic structure.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Haoyan Huo

Text-mined dataset of inorganic materials synthesis recipes

Unified Representation of Molecules and Crystals for Machine Learning

Semi-supervised machine-learning classification of materials synthesis procedures

Opportunities and challenges of text mining in materials research

Toward autonomous design and synthesis of novel inorganic materials

Similarity of Precursors in Solid-State Synthesis as Text-Mined from Scientific Literature

Unified representation of molecules and crystals for machine learning

Synthetic accessibility and stability rules of NASICONs

Contact Info

Product

Resources

About