Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.410
|View full text |Cite
|
Sign up to set email alerts
|

Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset

Abstract: Machine reading comprehension has made great progress in recent years owing to largescale annotated datasets. In the clinical domain, however, creating such datasets is quite difficult due to the domain expertise required for annotation. Recently, Pampari et al. (2018) tackled this issue by using expert-annotated question templates and existing i2b2 annotations to create emrQA, the first large-scale dataset for question answering (QA) based on clinical notes. In this paper, we provide an indepth analysis of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
41
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 26 publications
(43 citation statements)
references
References 38 publications
0
41
0
Order By: Relevance
“…and relations-related questions as Yue et al 75 found that the two subsets are more consistent. We utilized both F1-score and exact match score for evaluation.…”
Section: Acknowledgmentsmentioning
confidence: 94%
“…and relations-related questions as Yue et al 75 found that the two subsets are more consistent. We utilized both F1-score and exact match score for evaluation.…”
Section: Acknowledgmentsmentioning
confidence: 94%
“…SeaReader is proposed to answer questions in clinical medicine using documents extracted from publications in the medical domain. Yue et al (2020) conduct a thorough analysis of the emrQA dataset (Pampari et al, 2018) and explore the ability of QA systems to utilize clinical domain knowledge and to generalize to unseen questions. introduce PubMedQA where questions are derived based on article titles and can be answered with its respective abstracts.…”
Section: Related Workmentioning
confidence: 99%
“…Some recent works are trying to construct medical MRC dataset such as PubMedQA , emrQA (Pampari et al, 2018) and HEAD-QA (Vilares and Gómez-Rodríguez, 2019), etc. However, either these data sets are noisy (e.g., due to semi-automatically or heuristic rules generated), or the annotated data scale is too small (Yoon et al, 2019;Yue et al, 2020). Instead, we constructs a large scale medical MRC dataset by collecting 21.7k multiplechoice problems with human-annotated answers for the National Licensed Pharmacist Examination in China.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…To prevent the model overfitting on specific cases and encourage it to learn general language patterns, one possible way is to enlarge training data (Yang et al, 2019). However, clinical texts are usually difficult to obtain, not to mention the requirement of tremendous expert effort for annotations (Yue et al, 2020). To solve this, we introduce our data augmentation method PHICON, which consists of PHI augmentation and Context augmentation.…”
Section: Introductionmentioning
confidence: 99%