Materials discovery has become significantly facilitated and accelerated by high-throughput ab-initio computations. This ability to rapidly design interesting novel compounds has displaced the materials innovation bottleneck to the development of synthesis routes for the desired material. As there is no a fundamental theory for materials synthesis, one might attempt a data-driven approach for predicting inorganic materials synthesis, but this is impeded by the lack of a comprehensive database containing synthesis processes. To overcome this limitation, we have generated a dataset of “codified recipes” for solid-state synthesis automatically extracted from scientific publications. The dataset consists of 19,488 synthesis entries retrieved from 53,538 solid-state synthesis paragraphs by using text mining and natural language processing approaches. Every entry contains information about target material, starting compounds, operations used and their conditions, as well as the balanced chemical equation of the synthesis reaction. The dataset is publicly available and can be used for data mining of various aspects of inorganic materials synthesis.
Over the past decades, the number of published materials science articles has increased manyfold. Now, a major bottleneck in the materials discovery pipeline arises in connecting new results with the previously established literature. A potential solution to this problem is to map the unstructured raw-text of published articles onto a structured database entry that allows for programmatic querying. To this end, we apply text-mining with named entity recognition (NER), along with entity normalization, for large-scale information extraction from the published materials science literature. The NER is based on supervised machine learning with a recurrent neural network architecture, and the model is trained to extract summary-level information from materials science documents, including: inorganic material mentions, sample descriptors, phase labels, material properties and applications, as well as any synthesis and characterization methods used. Our classifer, with an overall accuracy (f1) of 87% on a test set, is applied to information extraction from 3.27 million materials science abstracts-the most information-dense section of published articles.Overall, we extract more than 80 million materials-science-related named entities, and the content of each abstract is represented as a database entry in a structured format. Our database shows far greater recall in document retrieval when compared to traditional text-based searches due to an entity normalization procedure that recognizes synonyms. We demonstrate that simple database queries can be used to answer complex \meta-questions" of the published literature that would have previously required laborious, manual literature searches to answer. All of our data has been made freely available for bulk download; we have also made a public facing application programming interface (https://github.com/materialsintelligence/matscholar) and website http://matscholar.herokuapp.com/search for easy interfacing with the data, trained models and functionality described in this paper. These results will allow researchers to access targeted information on a scale and with a speed that has not been previously available, and can be expected to accelerate the pace of future materials science discovery.
Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis. However, most content in the scientific literature is locked-up in written natural language, which is difficult to parse into databases using explicitly hard-coded classification rules. In this work, we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language. Without any human input, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps, such as "grinding" and "heating", "dissolving" and "centrifuging", etc. Guided by a modest amount of annotation, a random forest classifier can then associate these steps with different categories of materials synthesis, such as solid-state or hydrothermal synthesis. Finally, we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures. Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized, machine-readable database.
We demonstrate that an antiferromagnet can be employed for a highly efficient electrical manipulation of a ferromagnet. In our study, we use an electrical detection technique of the ferromagnetic resonance driven by an in-plane ac current in a NiFe/IrMn bilayer. At room temperature, we observe antidampinglike spin torque acting on the NiFe ferromagnet, generated by an in-plane current driven through the IrMn antiferromagnet. A large enhancement of the torque, characterized by an effective spin-Hall angle exceeding most heavy transition metals, correlates with the presence of the exchange-bias field at the NiFe/IrMn interface. It highlights that, in addition to the strong spin-orbit coupling, the antiferromagnetic order in IrMn governs the observed phenomenon.
Collecting and analyzing the vast amount of information available in the solid-state chemistry literature may accelerate our understanding of materials synthesis. However, one major problem is the difficulty of identifying which materials from a synthesis paragraph are precursors or are target materials. In this study, we developed a two-step Chemical Named Entity Recognition (CNER) model to identify precursors and targets, based on information from the context around material entities. Using the extracted data, we conducted a meta-analysis to study the similarities and differences between precursors in the context of solid-state synthesis. To quantify precursor similarity, we built a substitution model to calculate the viability of substituting one precursor with another while retaining the target. From a hierarchical clustering of the precursors, we demonstrate that "chemical similarity" of precursors can be extracted from text data. Quantifying the similarity of precursors helps provide a foundation for suggesting candidate reactants in a predictive synthesis model.
Over the past decades, the number of published materials science articles has increased manyfold. Now, a major bottleneck in the materials discovery pipeline arises in connecting new results with the previously established literature. A potential solution to this problem is to map the unstructured raw-text of published articles onto a structured database entry that allows for programmatic querying. To this end, we apply text-mining with named entity recognition (NER), along with entity normalization, for large-scale information extraction from the published materials science literature. The NER is based on supervised machine learning with a recurrent neural network architecture, and the model is trained to extract summary-level information from materials science documents, including: inorganic material mentions, sample descriptors, phase labels, material properties and applications, as well as any synthesis and characterization methods used. Our classifier, with an overall accuracy (f 1) of 87% on a test set, is applied to information extraction from 3.27 million materials science abstracts -the most information-dense section of published articles. Overall, we extract more than 80 million materials-science-related named entities, and the content of each abstract is represented as a database entry in a structured format. Our database shows far greater recall in document retrieval when compared to traditional text-based searches due to an entity normalization procedure that recognizes synonyms. We demonstrate that simple database queries can be used to answer complex "meta-questions" of the published literature that would have previously required laborious, manual literature searches to answer. All of our data has been made freely available for bulk download; we have also made a public facing application programming interface (https://github.com/materialsintelligence/matscholar) and website (http://matscholar.herokuapp.com/search) for easy interfacing with the data, trained models and functionality described in this paper. These results will allow researchers to access targeted information on a scale and with a speed that has not been previously available, and can be expected to accelerate the pace of future materials science discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.