2019
DOI: 10.1093/bioinformatics/btz682
|View full text |Cite
|
Sign up to set email alerts
|

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Abstract: Motivation: Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

24
3,115
5
5

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 3,933 publications
(3,149 citation statements)
references
References 33 publications
24
3,115
5
5
Order By: Relevance
“…While our experiments demonstrate this improvement only in three applications, we believe that it can generalize to other prediction models that rely on combinations of formalized and natural language knowledge. Our normalization method is currently limited by its reliance on lexical matching to identify mentions of ontology classes, while novel natural language methods often use machine learning models for this purpose as well (Lee et al, 2020). In future work, more experiments with different named entity recognition and normalization approaches are needed to improve our method.…”
Section: Discussionmentioning
confidence: 99%
“…While our experiments demonstrate this improvement only in three applications, we believe that it can generalize to other prediction models that rely on combinations of formalized and natural language knowledge. Our normalization method is currently limited by its reliance on lexical matching to identify mentions of ontology classes, while novel natural language methods often use machine learning models for this purpose as well (Lee et al, 2020). In future work, more experiments with different named entity recognition and normalization approaches are needed to improve our method.…”
Section: Discussionmentioning
confidence: 99%
“…We optimize BERT by first fine-tuning it with a domain-specific text corpus then fine-tune the resulting model for the different subtasks. This has been shown to be a strong baseline for various NLP tasks [2,20,55] including ABSA. Fine-tuning LMs for specific subtasks.…”
Section: Fine-tuning Pre-trained Language Modelsmentioning
confidence: 99%
“…However, to address several limitations, we choose to train our own clinical BERT model in this work. First, existing models are initialized from BioBERT [39] or BERT BASE [16], though SciBERT [6] outperforms BioBERT on a number of downstream tasks. Secondly, existing models do not satisfactorily encode the personal health identifiers (PHI) within the notes (e.g., [**2126-9-19**]), either leaving them as is, or removing them altogether.…”
Section: Pretrained Clinical Embeddingsmentioning
confidence: 99%