Improving biomedical named entity recognition with syntactic information

Tian, Yuanhe; Wang, Shen; Yan, Shuicheng; Xia, Fei; He, Min; Li, Kenli

doi:10.1186/s12859-020-03834-6

Cited by 34 publications

(7 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As we explained in the previous section, existing approaches have been limited by the difficulty of integrating hierarchical information such as a parse tree into a task that is linear in nature. Thus, either they make limited use of such syntactic information [128,129], or they develop ad hoc architectures that result in more complex, less generic, and less efficient models [95,136,137]. What we know for sure is that the use of information from parsers is beneficial but, since they have been tested on different data sets, it is difficult to determine which of those approaches for incorporating parsing information is more effective in general terms.…”

Section: Discussionmentioning

confidence: 99%

“…However, this will require a reduction of semantic parsing to sequence labeling. Some NER systems, notably [129], resort to pre-trained language models. End-toend-models based on large pre-trained language models suffer from high computational costs, with the associated environmental costs [141]; reduced inclusivity in multilingual settings (e.g., GPT-3 is currently only available for English, and training it for a new language has been estimated to cost more than USD 4 million with current hardware [142]); as well as lack of explainability, which can be provided with parsing.…”

Section: Discussionmentioning

confidence: 99%

“…Although these studies demonstrate the potential benefits of incorporating syntactic information, they are limited in either treating noisy syntactic information as gold references for training their taggers, or using direct concatenation to combine that information with context information without weighing it with respect to its contribution to the NER task. Tian et al [129] tried to find a better way to incorporate syntactic information into deep learning models for NER. For this purpose, they built BioKMNER, a NER model for biomedical texts based on Key-Value Memory Networks (KVMN) [130].…”

Section: Syntactic Information As a Feature For Sequence Labeling Nermentioning

confidence: 99%

“…The system by Tian et al [129] is based on bioBERT [131], a pre-trained biomedical language model designed for biomedical text mining tasks. It is worth remarking that there is a recent trend in end-to-end NLP systems that use powerful pre-trained language models with huge parameter spaces based on transformers to solve a variety of tasks, as in the case of the Bidirectional Encoder Representations from Transformers (BERT) [132] or Generative Pre-trained Transformer 3 (GPT-3) models [133].…”

Section: Syntactic Information As a Feature For Sequence Labeling Nermentioning

confidence: 99%

See 3 more Smart Citations

On the Use of Parsing for Named Entity Recognition

2021

View full text Add to dashboard Cite

Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Syntactic Information As a Feature For Sequence Labeling Nermentioning

confidence: 99%

Section: Syntactic Information As a Feature For Sequence Labeling Nermentioning

confidence: 99%

See 2 more Smart Citations

On the Use of Parsing for Named Entity Recognition

2021

View full text Add to dashboard Cite

show abstract

“…While available data includes trusted curated sets, experimental data provided by various depositors, as well as literature and biomedical publications that are annotated manually by indexers ( MEDLINE, 2021 ); an abundance of data can be extracted from unstructured text using named-entity recognition software ( Ratinov, 2009 ). Current named-entity recognition approaches include dictionary matching, use of rules to recognize specialized terminology, and context analysis using statistical and neural language models ( Sayle et al, 2011 ; Vazquez et al, 2011 ; Jessop et al, 2012 ; Rocktäschel et al, 2012 ; Gurulingappa et al, 2013 ; Lowe and Sayle, 2015 ; Pletscher-Frankild et al, 2015 ; Song et al, 2018 ; Devlin et al, 2019 ; Lee et al, 2020 ; Tian et al, 2020 ). To produce data for the PubChem literature knowledge panels, entities are annotated in a PubMed record using a third-party named-entity recognition software, LeadMine ( Lowe and Sayle, 2015 ), and matched to chemical synonyms in the PubChem Compound database and to gene, protein, and disease names, as described in Materials and Methods .…”

Section: Introductionmentioning

confidence: 99%

Discovering and Summarizing Relationships Between Chemicals, Genes, Proteins, and Diseases in PubChem

Zaslavsky

Cheng

Gindulytė

et al. 2021

Front. Res. Metr. Anal.

View full text Add to dashboard Cite

The literature knowledge panels developed and implemented in PubChem are described. These help to uncover and summarize important relationships between chemicals, genes, proteins, and diseases by analyzing co-occurrences of terms in biomedical literature abstracts. Named entities in PubMed records are matched with chemical names in PubChem, disease names in Medical Subject Headings (MeSH), and gene/protein names in popular gene/protein information resources, and the most closely related entities are identified using statistical analysis and relevance-based sampling. Knowledge panels for the co-occurrence of chemical, disease, and gene/protein entities are included in PubChem Compound, Protein, and Gene pages, summarizing these in a compact form. Statistical methods for removing redundancy and estimating relevance scores are discussed, along with benefits and pitfalls of relying on automated (i.e., not human-curated) methods operating on data from multiple heterogeneous sources.

show abstract