2021
DOI: 10.1021/acs.jcim.1c00554
|View full text |Cite
|
Sign up to set email alerts
|

Machine-Guided Polymer Knowledge Extraction Using Natural Language Processing: The Example of Named Entity Normalization

Abstract: A rich body of literature has emerged in recent years that discusses the extraction of structured information from materials science text through named entity recognition models. Relatively little work has been done to address the “normalization” of extracted entities, that is, recognizing that two or more seemingly different entities actually refer to the same entity in reality. In this work, we address the normalization of polymer named entities, polymers being a class of materials that often have a variety … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 15 publications
(11 citation statements)
references
References 44 publications
0
11
0
Order By: Relevance
“…This accounts for the majority of polymers with multiple reported names as detailed in Ref. 31 . Out of the remaining neat polymer records that did not have a normalized polymer name, we then counted all unique polymer names (accounting for case variations) and added them to the number of unique normalized polymer names to arrive at the estimated number of unique polymers.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…This accounts for the majority of polymers with multiple reported names as detailed in Ref. 31 . Out of the remaining neat polymer records that did not have a normalized polymer name, we then counted all unique polymer names (accounting for case variations) and added them to the number of unique normalized polymer names to arrive at the estimated number of unique polymers.…”
Section: Resultsmentioning
confidence: 99%
“…This is done using a dictionary lookup on a data set of polymer name clusters that were normalized using the workflow described in Ref. 31 . Note that we do not normalize all polymer names but only the ones which are included in our dictionary.…”
Section: Methodsmentioning
confidence: 99%
“…42,43 An orthogonal approach is to use ML itself to find polymer data that is published in the literature, an approach known as natural language processing (NLP). There has been some promising progress in this area notably in identifying polymer names, 46 recognizing that the same polymer is referred to by different names, 47 developing pipelines for property extraction, 48,49 and generating knowledge via word embeddings, which represent words as vectors. 50 However, the issue of deciphering the polymer name and capturing all of the relevant metadata is still not fully solved.…”
Section: ■ Creating An ML Pipelinementioning
confidence: 99%
“…Since most high-throughput screening assays are dependent on the experimental conditions on which they are acquired, it is difficult to develop models that translate well across multiple systems (different cell types, nucleic acid cargo types, or model organisms for instance). Minimum information standards and biomaterial-specific ontologies have been proposed to mitigate the tediousness of retrieving published data for analysis, and natural language processing methods must be employed to automate data extraction from published literature. , Systematic organization, labeling, and curation of experimental data must be enforced by journals to facilitate data mining efforts. Third, computational models must be capable of bridging multiple length scales in hierarchical materials, incorporate theoretical insights on dynamic biointerfacial processes, and be sufficiently robust to experimental noise.…”
Section: Data-driven Design Of Polymeric Vectorsmentioning
confidence: 99%