A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature

Nye, Benjamin E.; Li, Junyi Jessy; Patel, Roma; Yang, Yinfei; Marshall, Iain J.; Nenkova, Ani; Wallace, Byron C.

doi:10.18653/v1/p18-1019

Cited by 141 publications

(214 citation statements)

References 32 publications

(38 reference statements)

Supporting

Mentioning

209

Contrasting

Order By: Relevance

“…The mean of the Population (P) sentence scores is significantly lower than that for other types of sentences (I and O), again indicating that they are easier on average to annotate. This aligns with a previous finding that annotating Interventions and Outcomes is more difficult than annotating Participants (Nye et al, 2018).…”

Section: Quantifying Task Difficultysupporting

confidence: 92%

“…We again use LSTM-CRF-Pattern as the base model and experimenting on the EBM-NLP corpus (Nye et al, 2018). This is trained on either (1) the training set with difficult sentences removed, or (2) the full training set but with instances reweighted in proportion to their predicted difficulty score.…”

Section: Better Ie With Difficulty Predictionmentioning

confidence: 99%

“…In this paper we report experiments on the EBM-NLP corpus comprising crowdsourced annotations of medical literature (Nye et al, 2018). We operationalize the concept of annotation difficulty and show how it can be exploited during training to improve information extraction models.…”

Section: Introductionmentioning

confidence: 99%

“…Application DomainOur specific application concerns annotating abstracts of articles that describe the conduct and results of randomized controlled trials (RCTs). Experimentation in this domain has become easy with the recent release of the EBM-NLP(Nye et al, 2018) corpus, which includes a reasonably large training dataset annotated via crowdsourcing, and a modest test set labeled by individuals with advanced medical training. More specifically, the training set comprises 4,741 medical article abstracts with crowdsourced annotations indicating snippets (sequences) that describe the Participants (P), Interventions (I), and Outcome (O) elements of the respective RCT, and the test set is composed of 191 abstracts with P, I, O sequence annotations from three medical experts.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction

Yang

Agarwal

Tar

et al. 2019

Proceedings of the 2019 Conference of the North

Self Cite

View full text Add to dashboard Cite

Modern NLP systems require high-quality annotated data. In specialized domains, expert annotations may be prohibitively expensive. An alternative is to rely on crowdsourcing to reduce costs at the risk of introducing noise. In this paper we demonstrate that directly modeling instance difficulty can be used to improve model performance, and to route instances to appropriate annotators. Our difficulty prediction model combines two learned representations: a 'universal' encoder trained on out-ofdomain data, and a task-specific encoder. Experiments on a complex biomedical information extraction task using expert and lay annotators show that: (i) simply excluding from the training data instances predicted to be difficult yields a small boost in performance; (ii) using difficulty scores to weight instances during training provides further, consistent gains; (iii) assigning instances predicted to be difficult to domain experts is an effective strategy for task routing. Our experiments confirm the expectation that for specialized tasks expert annotations are higher quality than crowd labels, and hence preferable to obtain if practical. Moreover, augmenting small amounts of expert data with a larger set of lay annotations leads to further improvements in model performance.

show abstract

Section: Quantifying Task Difficultysupporting

confidence: 92%

Section: Better Ie With Difficulty Predictionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction

Yang

Agarwal

Tar

et al. 2019

Proceedings of the 2019 Conference of the North

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the following, we refer to these data as the ebm-nlp corpus. (Nye et al, 2018). The ebm-nlp corpus provided us with 5000 tokenized and annotated RCT abstracts for training, and 190 expert-annotated abstracts for testing.…”

Section: Ebm-nlpmentioning

confidence: 99%

Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks

Schmidt

Weeds

Higgins

2020

Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies

View full text Add to dashboard Cite

This research on data extraction methods applies recent advances in natural language processing to evidence synthesis based on medical texts. Texts of interest include abstracts of clinical trials in English and in multilingual contexts. The main focus is on information characterized via the Population, Intervention, Comparator, and Outcome (PICO) framework, but data extraction is not limited to these fields. Recent neural network architectures based on transformers show capacities for transfer learning and increased performance on downstream natural language processing tasks such as universal reading comprehension, brought forward by this architecture's use of contextualized word embeddings and self-attention mechanisms. This paper contributes to solving problems related to ambiguity in PICO sentence prediction tasks, as well as highlighting how annotations for training named entity recognition systems are used to train a high-performing, but nevertheless flexible architecture for question answering in systematic review automation. Additionally, it demonstrates how the problem of insufficient amounts of training annotations for PICO entity extraction is tackled by augmentation. All models in this paper were created with the aim to support systematic review (semi)automation. They achieve high F1 scores, and demonstrate the feasibility of applying transformer-based classification methods to support data mining in the biomedical literature.

show abstract

A real‐world evaluation of the implementation of NLP technology in abstract screening of a systematic review

Perlman‐Arrow

Noel

Bobrovitz

et al. 2023

Research Synthesis Methods

View full text Add to dashboard Cite

The laborious and time‐consuming nature of systematic review production hinders the dissemination of up‐to‐date evidence synthesis. Well‐performing natural language processing (NLP) tools for systematic reviews have been developed, showing promise to improve efficiency. However, the feasibility and value of these technologies have not been comprehensively demonstrated in a real‐world review. We developed an NLP‐assisted abstract screening tool that provides text inclusion recommendations, keyword highlights, and visual context cues. We evaluated this tool in a living systematic review on SARS‐CoV‐2 seroprevalence, conducting a quality improvement assessment of screening with and without the tool. We evaluated changes to abstract screening speed, screening accuracy, characteristics of included texts, and user satisfaction. The tool improved efficiency, reducing screening time per abstract by 45.9% and decreasing inter‐reviewer conflict rates. The tool conserved precision of article inclusion (positive predictive value; 0.92 with tool vs. 0.88 without) and recall (sensitivity; 0.90 vs. 0.81). The summary statistics of included studies were similar with and without the tool. Users were satisfied with the tool (mean satisfaction score of 4.2/5). We evaluated an abstract screening process where one human reviewer was replaced with the tool's votes, finding that this maintained recall (0.92 one‐person, one‐tool vs. 0.90 two tool‐assisted humans) and precision (0.91 vs. 0.92) while reducing screening time by 70%. Implementing an NLP tool in this living systematic review improved efficiency, maintained accuracy, and was well‐received by researchers, demonstrating the real‐world effectiveness of NLP in expediting evidence synthesis.

show abstract

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature

Cited by 141 publications

References 32 publications

Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction

Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction

Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks

A real‐world evaluation of the implementation of NLP technology in abstract screening of a systematic review

Contact Info

Product

Resources

About