2020
DOI: 10.1007/978-3-030-50417-5_23
|View full text |Cite
|
Sign up to set email alerts
|

SciNER: Extracting Named Entities from Scientific Literature

Abstract: The automated extraction of claims from scientific papers via computer is difficult due to the ambiguity and variability inherent in natural language. Even apparently simple tasks, such as isolating reported values for physical quantities (e.g., "the melting point of X is Y") can be complicated by such factors as domain-specific conventions about how named entities (the X in the example) are referenced. Although there are domain-specific toolkits that can handle such complications in certain areas, a generaliz… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 27 publications
(28 reference statements)
0
7
0
Order By: Relevance
“…In another study, using a domain-specific ontology for the Hotels domain, the authors have shown accurate extraction of named entities [2]. Hong et al [5] discussed the implementation of scientific named entity extraction. This is solely for entities that are scientific names in a given text; this can be further improved to a generalized entity extraction which does not require using a restricted entity.…”
Section: Domain-specific Approaches For Entity Extractionmentioning
confidence: 99%
“…In another study, using a domain-specific ontology for the Hotels domain, the authors have shown accurate extraction of named entities [2]. Hong et al [5] discussed the implementation of scientific named entity extraction. This is solely for entities that are scientific names in a given text; this can be further improved to a generalized entity extraction which does not require using a restricted entity.…”
Section: Domain-specific Approaches For Entity Extractionmentioning
confidence: 99%
“…While SpaCy is easy to use, it lacks flexibility: its end-to-end encapsulation does not expose many tunable parameters. Thus we also explore the use of a Keras-LSTM model that we developed in previous work for identification of polymers in materials science literature (Hong et al, 2020b). This model is based on the Bidirectional LSTM network with a conditional random field (CRF) layer added on top.…”
Section: Spacy and Keras-long-short Term Memory Modelsmentioning
confidence: 99%
“…The Keras LSTM model requires external word vectors since, unlike SpaCy, it does not include a word embedding model. To explore the affect of different word embedding models we trained both BERT (Devlin et al, 2018), a top-performing language model developed by Google, and FastText (Bojanowski et al, 2016), a model shown to have outperformed traditional Word2Vec models such as CBOW and Skipgram in our previous work (Hong et al, 2020b). While Google has released pre-trained BERT models, and researchers often build upon these models by "fine-tuning" them with additional training on small external datasets, it is not suitable to our problem as the vocabulary used in the CORD-19 is very different than the datasets used to train these models.…”
Section: Word Embedding Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…The lack of large data sets is currently being tackled by several efforts to compile data, ,,,, including through natural language processing and the generation of large computational data sets. It is also being tackled by transfer learning, a growing technique in ML where knowledge is transferred between tasks (e.g., prediction of properties), domains (e.g., scientific literature or English literature), or both, as detailed in an excellent review . Most commonly knowledge is transferred from a task or domain where data is plentiful to a task or domain where data is limited.…”
mentioning
confidence: 99%