2019
DOI: 10.1021/acs.jcim.9b00470
|View full text |Cite
|
Sign up to set email alerts
|

Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature

Abstract: Over the past decades, the number of published materials science articles has increased manyfold. Now, a major bottleneck in the materials discovery pipeline arises in connecting new results with the previously established literature. A potential solution to this problem is to map the unstructured raw-text of published articles onto a structured database entry that allows for programmatic querying. To this end, we apply text-mining with named entity recognition (NER), along with entity normalization, for large… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
121
0
2

Year Published

2020
2020
2023
2023

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 177 publications
(148 citation statements)
references
References 48 publications
1
121
0
2
Order By: Relevance
“…It does not focus solely on experimental procedures and is also able to extract spectroscopic attributes or information present in tables, for instance. Weston et al follow a similar strategy and apply their method on materials science abstracts with the goal to produce easily searchable knowledge databases 18 .…”
mentioning
confidence: 99%
“…It does not focus solely on experimental procedures and is also able to extract spectroscopic attributes or information present in tables, for instance. Weston et al follow a similar strategy and apply their method on materials science abstracts with the goal to produce easily searchable knowledge databases 18 .…”
mentioning
confidence: 99%
“…These newly predicted compounds comprise 13 phosphides, 17 arsenides, 8 antimonides, and 7 bismuthides ( Figure 5). A survey of the published literature using Matscholar (www.matscholar.com) 48 further confirms that these 45 compounds are indeed new and previously unreported (except LiCdSb 45 ). The larger number of undiscovered arsenides could be due to synthesis challenges associated with handling As, which is toxic.…”
Section: Stability Of Predicted Compoundsmentioning
confidence: 55%
“…Unsupervised and semi-supervised machine learning have been recently used to by-pass that problem, allowing one to reconstruct not only owcharts of possible synthetic procedures, but also to capture latent chemical knowledge such as the prediction of promising material candidates years prior their publication. [14][15][16] However, the quality of the datasets and the predictive power of literature-based models can only be as good as the original literature data. Many technical details and conditions are oen omitted from reported procedures, which rely on the experience of the experimentalist and hidden instrument settings.…”
Section: Harnessing the Power Of The Literaturementioning
confidence: 99%