2020
DOI: 10.1186/s12859-020-03834-6
|View full text |Cite
|
Sign up to set email alerts
|

Improving biomedical named entity recognition with syntactic information

Abstract: Background Biomedical named entity recognition (BioNER) is an important task for understanding biomedical texts, which can be challenging due to the lack of large-scale labeled training data and domain knowledge. To address the challenge, in addition to using powerful encoders (e.g., biLSTM and BioBERT), one possible method is to leverage extra knowledge that is easy to obtain. Previous studies have shown that auto-processed syntactic information can be a useful resource to improve model performance, but their… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 34 publications
(7 citation statements)
references
References 42 publications
0
7
0
Order By: Relevance
“…As we explained in the previous section, existing approaches have been limited by the difficulty of integrating hierarchical information such as a parse tree into a task that is linear in nature. Thus, either they make limited use of such syntactic information [128,129], or they develop ad hoc architectures that result in more complex, less generic, and less efficient models [95,136,137]. What we know for sure is that the use of information from parsers is beneficial but, since they have been tested on different data sets, it is difficult to determine which of those approaches for incorporating parsing information is more effective in general terms.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…As we explained in the previous section, existing approaches have been limited by the difficulty of integrating hierarchical information such as a parse tree into a task that is linear in nature. Thus, either they make limited use of such syntactic information [128,129], or they develop ad hoc architectures that result in more complex, less generic, and less efficient models [95,136,137]. What we know for sure is that the use of information from parsers is beneficial but, since they have been tested on different data sets, it is difficult to determine which of those approaches for incorporating parsing information is more effective in general terms.…”
Section: Discussionmentioning
confidence: 99%
“…However, this will require a reduction of semantic parsing to sequence labeling. Some NER systems, notably [129], resort to pre-trained language models. End-toend-models based on large pre-trained language models suffer from high computational costs, with the associated environmental costs [141]; reduced inclusivity in multilingual settings (e.g., GPT-3 is currently only available for English, and training it for a new language has been estimated to cost more than USD 4 million with current hardware [142]); as well as lack of explainability, which can be provided with parsing.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…While available data includes trusted curated sets, experimental data provided by various depositors, as well as literature and biomedical publications that are annotated manually by indexers ( MEDLINE, 2021 ); an abundance of data can be extracted from unstructured text using named-entity recognition software ( Ratinov, 2009 ). Current named-entity recognition approaches include dictionary matching, use of rules to recognize specialized terminology, and context analysis using statistical and neural language models ( Sayle et al, 2011 ; Vazquez et al, 2011 ; Jessop et al, 2012 ; Rocktäschel et al, 2012 ; Gurulingappa et al, 2013 ; Lowe and Sayle, 2015 ; Pletscher-Frankild et al, 2015 ; Song et al, 2018 ; Devlin et al, 2019 ; Lee et al, 2020 ; Tian et al, 2020 ). To produce data for the PubChem literature knowledge panels, entities are annotated in a PubMed record using a third-party named-entity recognition software, LeadMine ( Lowe and Sayle, 2015 ), and matched to chemical synonyms in the PubChem Compound database and to gene, protein, and disease names, as described in Materials and Methods .…”
Section: Introductionmentioning
confidence: 99%