Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-1140
|View full text |Cite
|
Sign up to set email alerts
|

CliCR: a Dataset of Clinical Case Reports for Machine Reading Comprehension

Abstract: We present a new dataset for machine comprehension in the medical domain. Our dataset uses clinical case reports with around 100,000 gap-filling queries about these cases. We apply several baselines and state-of-the-art neural readers to the dataset, and observe a considerable gap in performance (20% F1) between the best human and machine readers. We analyze the skills required for successful answering and show how reader performance varies depending on the applicable skills. We find that inferences using doma… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
51
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 70 publications
(60 citation statements)
references
References 43 publications
0
51
0
Order By: Relevance
“…The performances of the baselines rand-entity and maxfreq-entity presented in [7] are very poor because a random entity and the most frequent entity in the passage are used as answers, respectively. The lang-model method performs poor because it is based on queries only, without reading the document, it is difficult to provide accurate answers.…”
Section: Results Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…The performances of the baselines rand-entity and maxfreq-entity presented in [7] are very poor because a random entity and the most frequent entity in the passage are used as answers, respectively. The lang-model method performs poor because it is based on queries only, without reading the document, it is difficult to provide accurate answers.…”
Section: Results Analysismentioning
confidence: 99%
“…These tasks have attracted some researchers to carry out various researches, and have played dramatic roles in promoting researches in the clinical medical field [6]. And some related data sets have been proposed, such as CliCR [7], PubMedQA [8], Chimed [2] and emrQA [3] etc. Besides, the clinical field has accumulated extensive experience and knowledge, some of which have been uploaded to PubMed, one of the literature databases in the biomedical field, and has nearly 2 million publications with case types [9,10].…”
Section: Introductionmentioning
confidence: 99%
“…As questions are not proposed directly from documents, this task is challenging and some information extraction methods fail to deal with it. This methodology of creating MRC datasets enlightens lots of other researches [77,52,69]. In order to avoid that questions can be answered by knowledge out of the documents, all entities in documents are anonymized by random markers.…”
Section: -Cnn and Daily Mailmentioning
confidence: 99%
“…-CliCR To address the problem that there are scarce datasets for specific domains, Suster et al [77] build a large-scale cloze-style dataset based on clinical case reports for healthcare and medicine. Similar to the CNN & Daily Mail, summary points of each case reports are used to create queries by blanking out a medical entity.…”
Section: Ms Marco[51]mentioning
confidence: 99%
“…• We improve Japanese PAS analysis by combining the PAS-QA and RC-QA datasets. (Welbl et al, 2017;Suster and Daelemans, 2018;Pampari et al, 2018).…”
Section: Introductionmentioning
confidence: 99%