emrQA: A Large Corpus for Question Answering on Electronic Medical Records

Pampari, Anusri; Raghavan, Preethi; Liang, Jennifer J.; Peng, Jian

doi:10.18653/v1/d18-1258

Cited by 128 publications

(143 citation statements)

References 41 publications

(57 reference statements)

Supporting

Mentioning

140

Contrasting

Order By: Relevance

“…We followed the SQuAD 2.0 task setting, because it can be critical to have the system refrain from making false suggestions especially in some clinical applications. >> Figure 1 emrQA The emrQA [6] is a large training set annotated for RCQA in the clinical domain. It was generated by template-based semantic extraction from the i2b2 NLP challenge datasets [7].…”

Section: Squadmentioning

confidence: 99%

Adapting and evaluating a deep learning language model for clinical why-question answering

et al. 2020

View full text Add to dashboard Cite

ObjectivesTo adapt and evaluate a deep learning language model for answering why-questions based on patient-specific clinical text. Materials and MethodsBidirectional encoder representations from transformers (BERT) models were trained with varying data sources to perform SQuAD 2.0 style why-question answering (why-QA) on clinical notes. The evaluation focused on: 1) comparing the merits from different training data, 2) error analysis. ResultsThe best model achieved an accuracy of 0.707 (or 0.760 by partial match). Training toward customization for the clinical language helped increase 6% in accuracy. DiscussionThe error analysis suggested that the model did not really perform deep reasoning and that clinical why-QA might warrant more sophisticated solutions. ConclusionThe BERT model achieved moderate accuracy in clinical why-QA and should benefit from the rapidly evolving technology. Despite the identified limitations, it could serve as a competent proxy for question-driven clinical information extraction.

show abstract

Section: Squadmentioning

confidence: 99%

Adapting and evaluating a deep learning language model for clinical why-question answering

et al. 2020

View full text Add to dashboard Cite

show abstract

“…However, if we were to scale our approach to real-world application, we would require external data. Therefore for future work, given more time, we would like to use external datasets such as emrQA (Pampari et al, 2018) and explore multi-task learning due to the similarity of the three tasks and aim to incorporate other medical tasks for a better generalisation of the biomedical question answering. We would also want to train the BERT models on biomedical-focused vocabulary and additional data in the future as a baseline to compare against multi-task learning.…”

Section: Question Answering Baseline System Problemsmentioning

confidence: 99%

ANU-CSIRO at MEDIQA 2019: Question Answering Using Deep Contextual Knowledge

Nguyen

Karimi

Xing

2019

Proceedings of the 18th BioNLP Workshop and Shared Task

View full text Add to dashboard Cite

We report on our system for textual inference and question entailment in the medical domain for the ACL BioNLP 2019 Shared Task, MEDIQA. Textual inference is the task of finding the semantic relationships between pairs of text. Question entailment involves identifying pairs of questions which have similar semantic content. To improve upon medical natural language inference and question entailment approaches to further medical question answering, we propose a system that incorporates open-domain and biomedical domain approaches to improve semantic understanding and ambiguity resolution. Our models achieve 80% accuracy on medical natural language inference (6.5% absolute improvement over the original baseline), 48.9% accuracy on recognising medical question entailment, 0.248 Spearman's rho for question answering ranking and 68.6% accuracy for question answering classification.

show abstract

“…The patient's notes were then loaded into an annotation tool for them to mark answer text spans. Pampari, Raghavan, Liang, & Peng (2018) developed emrQA, a large clinical QA corpus generated through template-based semantic extraction from the i2b2 NLP challenge datasets. * The emrQA contains 7.5% of why-QAs, but they mainly ask about why the patient received a test or treatment, due to the partial interest of the original challenge annotations.…”

Section: Annotating and Characterizing Clinical Sentences With Explicmentioning

confidence: 99%

Proceedings of the 2nd Clinical Natural Language Processing Workshop

2019

View full text Add to dashboard Cite

Crucial information about the practice of healthcare is recorded only in free-form text, which creates an enormous opportunity for high-impact NLP. However, annotated healthcare datasets tend to be small and expensive to obtain, which raises the question of how to make maximally efficient uses of the available data. To this end, we develop an LSTM-CRF model for combining unsupervised word representations and hand-built feature representations derived from publicly available healthcare ontologies. We show that this combined model yields superior performance on five datasets of diverse kinds of healthcare text (clinical, social, scientific, commercial). Each involves the labeling of complex, multi-word spans that pick out different healthcare concepts. We also introduce a new labeled dataset for identifying the treatment relations between drugs and diseases.

show abstract

emrQA: A Large Corpus for Question Answering on Electronic Medical Records

Cited by 128 publications

References 41 publications

Adapting and evaluating a deep learning language model for clinical why-question answering

Adapting and evaluating a deep learning language model for clinical why-question answering

ANU-CSIRO at MEDIQA 2019: Question Answering Using Deep Contextual Knowledge

Proceedings of the 2nd Clinical Natural Language Processing Workshop

Contact Info

Product

Resources

About