“…Since the BERT models were found to be effective for a wide range of NLP tasks (Devlin et al, 2019), several efforts have been extended towards improving them by more efficient training strategies Yang et al, 2019b;Sanh et al, 2019;Lan et al, 2019), training them for different domains Lee et al, 2019a;Lee and Hsiang, 2019;Chalkidis et al, 2020;Gururangan et al, 2020) and languages (Devlin, 2018;de Vries et al, 2019;Le et al, 2020;Martin et al, 2020;Delobelle et al, 2020;Cañete et al, 2020). Within the clinical domain, different models include the BioBERT models pretrained on PubMed abstracts and PMC full-text articles (Lee et al, 2019a), SciBERT trained on scientific text , clinicalBERT models trained on patient notes from the MIMIC-III corpus (Johnson et al, 2016) (sometimes as a continuation of the BioBERT models) (Alsentzer et al, 2019), and BlueBERT models that also use Pubmed abstracts and MIMIC-III patient notes for training .…”