SciBERT: A Pretrained Language Model for Scientific Text

Beltagy, Iz; Lo, Kyle; Cohan, Arman

doi:10.18653/v1/d19-1371

Cited by 1,794 publications

(1,602 citation statements)

References 23 publications

Supporting

Mentioning

1,579

Contrasting

Unclassified

Order By: Relevance

“…However, to address several limitations, we choose to train our own clinical BERT model in this work. First, existing models are initialized from BioBERT [39] or BERT BASE [16], though SciBERT [6] outperforms BioBERT on a number of downstream tasks. Secondly, existing models do not satisfactorily encode the personal health identifiers (PHI) within the notes (e.g., [**2126-9-19**]), either leaving them as is, or removing them altogether.…”

Section: Pretrained Clinical Embeddingsmentioning

confidence: 99%

“…Initialization. Unlike previous approaches, we initialize our model from SciBERT, which has been shown to have better performance on a variety of benchmarking tasks [6].…”

Section: Baseline Clinical Bert Pretrainingmentioning

confidence: 99%

“…Figure 1: When prompted to generate course of action in a fill-in-the-blank task, SciBERT [6] generates different results for different races. Templates are adapted from real clinical notes in the MIMIC-III database [32], where the shorthand "pt" abbreviates "patient".…”

Section: Introductionmentioning

confidence: 99%

“…In order to assess the impact of biases of pretrained systems, we train a BERT model initialized from SciBERT, a public BERT model pretrained on scientific text [6], on the clinical notes found in the MIMIC-III database [32], generating a baseline clinical BERT model. As a motivating example, in Figure 1, we present a sample medical word completion task using SciBERT to generate medical context given patient race.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Hurtful words

Zhang

Abdalla

et al. 2020

Proceedings of the ACM Conference on Health, Inference, and Learning

View full text Add to dashboard Cite

In this work, we examine the extent to which embeddings may encode marginalized populations differently, and how this may lead to a perpetuation of biases and worsened performance on clinical tasks. We pretrain deep embedding models (BERT) on medical notes from the MIMIC-III hospital dataset, and quantify potential disparities using two approaches. First, we identify dangerous latent relationships that are captured by the contextual word embeddings using a fill-in-the-blank method with text from real clinical notes and a log probability bias score quantification. Second, we evaluate performance gaps across different definitions of fairness on over 50 downstream clinical prediction tasks that include detection of acute and chronic conditions. We find that classifiers trained from BERT representations exhibit statistically significant differences in performance, often favoring the majority group with regards to gender, language, ethnicity, and insurance status. Finally, we explore shortcomings of using adversarial debiasing to obfuscate subgroup information in contextual word embeddings, and recommend best practices for such deep embedding models in clinical settings.

show abstract

Section: Pretrained Clinical Embeddingsmentioning

confidence: 99%

“…Initialization. Unlike previous approaches, we initialize our model from SciBERT, which has been shown to have better performance on a variety of benchmarking tasks [6].…”

Section: Baseline Clinical Bert Pretrainingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Hurtful words

Zhang

Abdalla

et al. 2020

Proceedings of the ACM Conference on Health, Inference, and Learning

View full text Add to dashboard Cite

show abstract

“…We also extend these efforts to incorporate PMC full text articles, which provide access to approximately 6 times as many relations per relevant document than PubMed abstracts (5.1 relations per full-text article vs .8 relations per abstract only article when pooling across all 3 relation types), as well as transcription factors after lending credibility to the shared relation types through comparison with those first published in iX. Finally, we compare a weakly supervised approach, using Snorkel [2], for relation extraction (RE) to one based on transfer learning, using SciBERT [3], in order to evaluate what advantages, if any, arise from the auxiliary engineering effort inherent to weak supervision. In summary, this study is intended to demonstrate the viability of weak supervision for biological relation extraction in scientific literature as well as share a large database of T cell-specific cytokine and transcription factor relationships.…”

Section: Introductionmentioning

confidence: 99%

Extracting T Cell Function and Differentiation Characteristics from the Biomedical Literature

Czech

Hammerbacher

2019

Preprint

View full text Add to dashboard Cite

The role of many cytokines and transcription factors in the function and development of human T cells has been the subject of extensive research, however much of this work only demonstrates experimental findings for a relatively small portion of the molecular signaling network that enables the plasticity inherent to these cells. We apply recent advancements in methods for weak supervision and transfer learning for natural language models to aid in extracting these individual findings as 283k cell type, cytokine, and transcription factor relations from 64k relevant documents (53k full-text PMC articles and 11k PubMed abstracts). All data, results and source code available at https://github.com/hammerlab/t-cell-relation-extraction .

show abstract