Biomedical Named Entity Recognition with Multilingual BERT

Hakala, Kai; Pyysalo, Sampo

doi:10.18653/v1/d19-5709

Cited by 63 publications

(31 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, multilingual BERT (Devlin et al, 2019) was trained on Wikipedia articles from more than 100 languages. Although performance improvements show the possibility to use multilingual BERT in monolingual (Hakala and Pyysalo, 2019), multilingual (Tsai et al, 2019) and cross-lingual settings (Wu and Dredze, 2019), it has been questioned whether multilingual BERT is truly multilingual (Pires et al, 2019;Singh et al, 2019;Libovickỳ et al, 2019). Therefore, we will investigate the benefits of aligning its embeddings in our experiments.…”

Section: Related Workmentioning

confidence: 99%

Adversarial Alignment of Multilingual Models for Extracting Temporal Expressions from Text

Lange¹,

Iurshina²,

Adel³

et al. 2020

Proceedings of the 5th Workshop on Representation Learning for NLP

View full text Add to dashboard Cite

Although temporal tagging is still dominated by rule-based systems, there have been recent attempts at neural temporal taggers. However, all of them focus on monolingual settings. In this paper, we explore multilingual methods for the extraction of temporal expressions from text and investigate adversarial training for aligning embedding spaces to one common space. With this, we create a single multilingual model that can also be transferred to unseen languages and set the new state of the art in those cross-lingual transfer experiments.

show abstract

Section: Related Workmentioning

confidence: 99%

Adversarial Alignment of Multilingual Models for Extracting Temporal Expressions from Text

Lange¹,

Iurshina²,

Adel³

et al. 2020

Proceedings of the 5th Workshop on Representation Learning for NLP

View full text Add to dashboard Cite

show abstract

“…BERT is a multi-layer transformer trained on the English Wikipedia and BookCorpus (Devlin et al, 2018). While it is trained to predict whether a sentence follows another and randomly blacked out words, the resulting language model can be finetuned for different tasks, such as NER (Hakala and Pyysalo, 2019) and NEN, or adapted for different domains through further training. BioBERT is the result of training BERT on PubMed articles, making it useful for biomedical applications (Lee et al, 2020;Sun and Yang, 2019).…”

Section: Biobertmentioning

confidence: 99%

Annotating the Pandemic: Named Entity Recognition and Normalisation in COVID-19 Literature

Colic¹,

Furrer²,

Rinaldi³

2020

Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

View full text Add to dashboard Cite

The COVID-19 pandemic has been accompanied by such an explosive increase in media coverage and scientific publications that researchers find it difficult to keep up.We are presenting a publicly available pipeline to perform named entity recognition and normalisation in parallel to help find relevant publications and to aid in downstream NLP tasks such as text summarisation. In our approach, we are using a dictionary-based system for its high recall in conjunction with two models based on BioBERT for their accuracy. Their outputs are combined according to different strategies depending on the entity type. In addition, we are using a manually crafted dictionary to increase performance for new concepts related to COVID-19.We have previously evaluated our work on the CRAFT corpus, and make the output of our pipeline available on two visualisation platforms.

show abstract

“…Current state-of-the-art NER systems are mainly based on annotated data and machine learning approaches. The lexicons introduced in some of these systems are mainly for extracting some external features (Liu et al, 2015;Agerri and Rigau, 2016;Chiu and Nichols, 2016 (Huang et al, 2015), the convolutional neural network (CNN) plus a CRF layer, the combination of LSTM and CNN (Chiu and Nichols, 2016), and the BERT based LSTM+CRF model (Jiang et al, 2019;Hakala and Pyysalo, 2019).…”

Section: Related Workmentioning

confidence: 99%

Toward Recognizing More Entity Types in NER: An Efficient Implementation using Only Entity Lexicons

Peng

Zhang

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

In this work, we explore the way to quickly adjust an existing named entity recognition (NER) system to make it capable of recognizing entity types not defined in the system. As an illustrative example, consider the case that a NER system has been built to recognize person and organization names, and now it requires to additionally recognize job titles. Such a situation is common in the industrial areas, where the entity types required to recognize vary a lot in different products and keep changing. To avoid laborious data labeling and achieve fast adaptation, we propose to adjust the existing NER system using the previously labeled data and entity lexicons of the newly introduced entity types. We formulate such a task as a partially supervised learning problem and accordingly propose an effective algorithm to solve the problem. Comprehensive experimental studies on several public NER datasets validate the effectiveness of our method.

show abstract

Biomedical Named Entity Recognition with Multilingual BERT

Cited by 63 publications

References 16 publications

Adversarial Alignment of Multilingual Models for Extracting Temporal Expressions from Text

Adversarial Alignment of Multilingual Models for Extracting Temporal Expressions from Text

Annotating the Pandemic: Named Entity Recognition and Normalisation in COVID-19 Literature

Toward Recognizing More Entity Types in NER: An Efficient Implementation using Only Entity Lexicons

Contact Info

Product

Resources

About