Lessons Learned from Applying off-the-shelf BERT: There is no Silver Bullet

Makarenkov, Victor; Rokach, Lior

doi:10.48550/arxiv.2009.07238

Cited by 2 publications

(2 citation statements)

References 10 publications

(14 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many studies pre-trained BERT models with biomedical literature (Lee et al, 2020;Beltagy et al, 2019) or clinical notes (Alsentzer et al, 2019;Peng et al, 2019; to develop the domain-specific language model, and these studies showed that domain-specific models generally outperform off-the-shelf models in varied clinical NLP tasks, such as clinical NER (Yang et al, 2020b;Greenspan et al, 2020;, relation extraction , sentence similarity (Peng et al, 2019), negation detection (Lin et al, 2020), and concept normalization . However, for clinical text classification, which generally requires a series of clinical notes as input (e.g., automatic ICD coding, clinical outcome prediction), BERT does not always perform well probably because of its restriction in computational resources and the fixed-length setting (Li and Yu, 2020;Makarenkov and Rokach, 2020;. In keeping more closely with the spirit of Transformers, our work is also built on top of Transformers with an emphasized focus on effective representation of document sequences, such as all of a patient's clinical notes in an inpatient visit.…”

Section: Transformer Models In Clinical Domainmentioning

confidence: 99%

Three-level Hierarchical Transformer Networks for Long-sequence and Multiple Clinical Documents Classification

Si¹,

Roberts²

2021

Preprint

View full text Add to dashboard Cite

We present the Hierarchical Transformer Networks for modeling long-term dependencies across clinical notes for the purpose of patientlevel prediction. The network is equipped with three levels of Transformer-based encoders to learn progressively from words to sentences, sentences to notes, and finally notes to patients. The first level from word to sentence directly applies a pre-trained BERT model, and the second and third levels both implement a stack of 2-layer encoders before the final patient representation is fed into the classification layer for clinical predictions. Compared to traditional BERT models, our model increases the maximum input length from 512 words to much longer sequences that are appropriate for long sequences of clinical notes. We empirically examine and experiment with different parameters to identify an optimal trade-off given computational resource limits. Our experimental results on the MIMIC-III dataset for different prediction tasks demonstrate that our proposed hierarchical model outperforms previous state-of-the-art hierarchical neural networks 1 . Codes are available at https://github.com/Yuqi92/hier_ transformer_networks.

show abstract

Section: Transformer Models In Clinical Domainmentioning

confidence: 99%

Three-level Hierarchical Transformer Networks for Long-sequence and Multiple Clinical Documents Classification

Si¹,

Roberts²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Similarly, the XED multilingual dataset for emotion detection catering for a total of 32 languages has been evaluated using languagespecific BERT models ( Öhman et al, 2020). Lastly, (Makarenkov and Rokach, 2020) explore several off-the-shelf BERT models, where they show that the complexity and computational cost of BERT does not provide a guarantee for an improved predictive performance for classification tasks. This is especially relevant in cases where small domainspecific datasets are used, which datasets are also imbalanced due to the minority class being underrepresented.…”

Section: Related Workmentioning

confidence: 99%

Fine-tuning Neural Language Models for Multidimensional Opinion Mining of English-Maltese Social Data

Cortis

Verma

Davis

2021

Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Me

View full text Add to dashboard Cite

This paper presents multidimensional Social Opinion Mining on user-generated content gathered from newswires and social networking services in three different languages: English -a high-resourced language, Maltese -a low-resourced language, and Maltese-English -a code-switched language. Multiple fine-tuned neural classification language models which cater for the i) English, Maltese and Maltese-English languages as well as ii) five different social opinion dimensions, namely subjectivity, sentiment polarity, emotion, irony and sarcasm, are presented. Results per classification model for each social opinion dimension are discussed.

show abstract

Lessons Learned from Applying off-the-shelf BERT: There is no Silver Bullet

Cited by 2 publications

References 10 publications

Three-level Hierarchical Transformer Networks for Long-sequence and Multiple Clinical Documents Classification

Three-level Hierarchical Transformer Networks for Long-sequence and Multiple Clinical Documents Classification

Fine-tuning Neural Language Models for Multidimensional Opinion Mining of English-Maltese Social Data

Contact Info

Product

Resources

About