emrKBQA: A Clinical Knowledge-Base Question Answering Dataset

Raghavan, Preethi; Liang, Jennifer J.; Mahajan, Diwakar; Chandra, Rachita; Szolovits, Peter

doi:10.18653/v1/2021.bionlp-1.7

Cited by 11 publications

(9 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…But as far as the authors' knowledge, so far there is no multi-modal clinical dataset that encorporates structured and unstructured EHR data for QA. QA in EHRs has been limited to QA over knowledge bases (Wang et al, 2021), EHR tables (Wang et al, 2020b;Raghavan et al, 2021) or clinical notes (Johnson et al, 2016b;Pampari et al, 2018). emrQA (Pampari et al, 2018) and Clin-iQG4QA (Yue et al, 2021) There are QA datasets that are generated using templatebased method like MIMICSQL (Wang et al, 2020b) and emrKBQA (Raghavan et al, 2021) which utilize the structured EHR tables of MIMIC-III for QA.…”

Section: Related Workmentioning

confidence: 99%

“…QA in EHRs has been limited to QA over knowledge bases (Wang et al, 2021), EHR tables (Wang et al, 2020b;Raghavan et al, 2021) or clinical notes (Johnson et al, 2016b;Pampari et al, 2018). emrQA (Pampari et al, 2018) and Clin-iQG4QA (Yue et al, 2021) There are QA datasets that are generated using templatebased method like MIMICSQL (Wang et al, 2020b) and emrKBQA (Raghavan et al, 2021) which utilize the structured EHR tables of MIMIC-III for QA. emrKBQA contains 940,000 questions, logical forms and answers which uses the structured records of MIMIC-III.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries

Bardhan¹,

Colas²,

Roberts³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper develops the first question answering dataset (DrugEHRQA) containing question-answer pairs from both structured tables and unstructured notes from a publicly available Electronic Health Record (EHR). EHRs contain patient records, stored in structured tables and unstructured clinical notes. The information in structured and unstructured EHRs is not strictly disjoint: information may be duplicated, contradictory, or provide additional context between these sources. Our dataset has medication-related queries, containing over 70,000 question-answer pairs. To provide a baseline model and help analyze the dataset, we have used a simple model (MultimodalEHRQA) which uses the predictions of a modality selection network to choose between EHR tables and clinical notes to answer the questions. This is used to direct the questions to the table-based or text-based state-of-the-art QA model. In order to address the problem arising from complex, nested queries, this is the first time Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers (RAT-SQL) has been used to test the structure of query templates in EHR data. Our goal is to provide a benchmark dataset for multi-modal QA systems, and to open up new avenues of research in improving question answering over EHR structured data by using context from unstructured clinical data.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries

Bardhan¹,

Colas²,

Roberts³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Recent works on EHR-QA with structured data (e.g., relational database or knowledge graph) have been focused on converting natural language questions (NLQ) into query languages such as SQL or SPARQL (Wang et al, 2020;Park et al, 2021;Bae et al, 2021) or into domain-specific forms (Raghavan et al, 2021). However, because all previous works mentioned above rely on specific query languages, the problem scope is limited to pre-defined data types (e.g., string, int, timestamp) and operations.…”

Section: Introductionmentioning

confidence: 99%

Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records

Kim¹,

Seongsu²,

Kim³

et al. 2022

Preprint

View full text Add to dashboard Cite

Question Answering on Electronic Health Records (EHR-QA) has a significant impact on the healthcare domain, and it is being actively studied. Previous research on structured EHR-QA focuses on converting natural language queries into query language such as SQL or SPARQL (NLQ2Query), so the problem scope is limited to pre-defined data types by the specific query language. In order to expand the EHR-QA task beyond this limitation to handle multi-modal medical data and solve complex inference in the future, more primitive systemic language is needed. In this paper, we design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction. We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a programbased approach in a semi-supervised manner in order to overcome the absence of gold programs. Without the gold program, our proposed model shows comparable performance to the previous state-of-the-art model, which is an NLQ2Query model (0.9% gain). In addition, for a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question. We empirically confirmed data uncertainty is most indicative of the ambiguity in the input question.Data and Code Availability Our source code and dataset are available on the official repository 1 .

show abstract

“…Applying methods in natural language processing to the EHR is a growing field with many potential applications in clinical decision support and augmented care. Corpus and annotation on EHR data are created to model semantic features and relation through linguistic cues, including relation extraction (Mowery et al, 2008), named entity recognition (Wang, 2009;Patel et al, 2018;Lybarger et al, 2021), question answering (Pampari et al, 2018;Raghavan et al, 2021), natural language inference (Romanov and Shivade, 2018), etc. However, few corpora have been built to model clinical thinking, especially about clinical diagnostic reasoning, a process involving clinical evidence acquisition, generating hypothesis, integration and abstraction over medical knowledge and synthesizing a conclusion in the form of a diagnosis and treatment plan (Bowen, 2006).…”

Section: Introductionmentioning

confidence: 99%

Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding

Gao,

Dligach,

Miller

et al. 2022

Preprint

View full text Add to dashboard Cite

Applying methods in natural language processing on electronic health records (EHR) data is a growing field. Existing corpus and annotation focus on modeling textual features and relation prediction. However, there is a paucity of annotated corpus built to model clinical diagnostic thinking, a process involving text understanding, domain knowledge abstraction and reasoning. This work introduces a hierarchical annotation schema with three stages to address clinical text understanding, clinical reasoning, and summarization. We created an annotated corpus based on an extensive collection of publicly available daily progress notes, a type of EHR documentation that is collected in time series in a problem-oriented format. The conventional format for a progress note follows a Subjective, Objective, Assessment and Plan heading (SOAP). We also define a new suite of tasks, Progress Note Understanding, with three tasks utilizing the three annotation stages. The novel suite of tasks was designed to train and evaluate future NLP models for clinical text understanding, clinical knowledge representation, inference, and summarization.

show abstract

emrKBQA: A Clinical Knowledge-Base Question Answering Dataset

Cited by 11 publications

References 15 publications

DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries

DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries

Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records

Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding

Contact Info

Product

Resources

About