Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset

Yue, Xiang; Gutiérrez, Bernal Jiménez; Sun, Huan

doi:10.48550/arxiv.2005.00574

Cited by 2 publications

(6 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…emrQA falls into the category of extractive question answering, aiming to identify answer spans from reference texts instead of generating new answers in a word-by-word fashion. Researchers have attempted to solve emrQA tasks by using word embedding models [28], conditional random fields (CRFs) [29] and transformer-based models [30], among which transformer-based models performed best. In our experiments, we investigate the performance of our pre-trained models using the three largest emrQA subsets: Medication, Relation, and Heart Disease.…”

Section: Question Answeringmentioning

confidence: 99%

“…F1score is a looser metric derived from token-level precision and recall, which measures the overlap between the predictions and the targets. We generate train-dev-test splits by following the instruction of Yue et al [28]. The training set of relation and medication subsets are randomly under-sampled to reduce training time.…”

Section: Question Answeringmentioning

confidence: 99%

“…Based on their experience, performance was not compromised by under-sampling. Of note, the emrQA dataset has some known issues, e.g., incomplete answers, it is template-based, and the annotation were generated semiautomatically [28]. We consider the usage of emrQA as a proof-of-concept experiment to compare the performance of the transformer-based model on the QA task.…”

Section: Question Answeringmentioning

confidence: 99%

See 2 more Smart Citations

A comparative study of pretrained language models for long clinical text

Wehbe

Ahmad

et al. 2022

Journal of the American Medical Informatics Association

View full text Add to dashboard Cite

Objective Clinical knowledge-enriched transformer models (eg, ClinicalBERT) have state-of-the-art results on clinical natural language processing (NLP) tasks. One of the core limitations of these transformer models is the substantial memory consumption due to their full self-attention mechanism, which leads to the performance degradation in long clinical texts. To overcome this, we propose to leverage long-sequence transformer models (eg, Longformer and BigBird), which extend the maximum input sequence length from 512 to 4096, to enhance the ability to model long-term dependencies in long clinical texts. Materials and methods Inspired by the success of long-sequence transformer models and the fact that clinical notes are mostly long, we introduce 2 domain-enriched language models, Clinical-Longformer and Clinical-BigBird, which are pretrained on a large-scale clinical corpus. We evaluate both language models using 10 baseline tasks including named entity recognition, question answering, natural language inference, and document classification tasks. Results The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT and other short-sequence transformers in all 10 downstream tasks and achieve new state-of-the-art results. Discussion Our pretrained language models provide the bedrock for clinical NLP using long texts. We have made our source code available at https://github.com/luoyuanlab/Clinical-Longformer, and the pretrained models available for public download at: https://huggingface.co/yikuan8/Clinical-Longformer. Conclusion This study demonstrates that clinical knowledge-enriched long-sequence transformers are able to learn long-term dependencies in long clinical text. Our methods can also inspire the development of other domain-enriched long-sequence transformers.

show abstract

Section: Question Answeringmentioning

confidence: 99%

Section: Question Answeringmentioning

confidence: 99%

Section: Question Answeringmentioning

confidence: 99%

See 1 more Smart Citation

A comparative study of pretrained language models for long clinical text

Wehbe

Ahmad

et al. 2022

Journal of the American Medical Informatics Association

View full text Add to dashboard Cite

show abstract

“…emrQA falls into the category of extractive question answering that aims at identifying the answer spans from reference contexts instead of generating answers in a word-by-word fashion. Researchers have attempted to solve emrQA tasks by using word embedding models [25], conditional random fields (CRFs) [26] and transformer-based models [27], among which transformer-based models defeated their competitors in terms of performance. In our experiments, we investigated the performance of our pre-trained models using the three largest emrQA subsets: Medication, Relation and Heart Disease.…”

Section: Question Answeringmentioning

confidence: 99%

“…F1-score is a looser metric derived from token-level precision and recall, which aims at measuring the overlap between the predictions and the targets. We generated train-dev-test splits by following the instruction of [25], where the training set of relation and medication subsets were randomly under-sampled to reduce training time. Based on their experience, the performance would not be compromised after under-sampling.…”

Section: Question Answeringmentioning

confidence: 99%

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Li¹,

Wehbe²,

Ahmad³

et al. 2022

Preprint

View full text Add to dashboard Cite

Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely Clinical-BERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longformer and BigBird, were proposed with the idea of sparse attention mechanism to reduce the memory usage from quadratic to the sequence length to a linear scale. These models extended the maximum input sequence length from 512 to 4096, which enhanced the ability of modeling long-term dependency and consequently achieved optimal results in a variety of tasks. Inspired by the success of these long sequence transformer models, we introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora. We evaluate both pre-trained models using 10 baseline tasks including named entity recognition, question answering, and document classification tasks. The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT as well as other short-sequence transformers in all downstream tasks. We have made the pre-trained models available for public download at: https://huggingface.co/yikuan8/Clinical-Longformer.

show abstract

Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset

Cited by 2 publications

References 35 publications

A comparative study of pretrained language models for long clinical text

A comparative study of pretrained language models for long clinical text

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Contact Info

Product

Resources

About