DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature

Dhrangadhariya, Anjani; Müller, Henning

doi:10.18653/v1/2022.bionlp-1.34

Cited by 3 publications

(2 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Vocabularies are structured, standardized data sources that do not capture writing variations from clinical literature and custom-built ReGeX are restricted by either task or entity type. 35 , 36 We used distant supervision dictionaries created from the structured fields of clinicaltrials.gov (CTO) as described by Dhrangadhariya and Müller 22 Principal investigators of the clinical study manually enter data in CTO, thereby incorporating large-scale writing variations. 37 …”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Not so weak PICO: leveraging weak supervision for participants, interventions, and outcomes recognition for systematic review automation

Dhrangadhariya

Müller

2023

JAMIA Open

View full text Add to dashboard Cite

Objective The aim of this study was to test the feasibility of PICO (participants, interventions, comparators, outcomes) entity extraction using weak supervision and natural language processing. Methodology We re-purpose more than 127 medical and nonmedical ontologies and expert-generated rules to obtain multiple noisy labels for PICO entities in the evidence-based medicine (EBM)-PICO corpus. These noisy labels are aggregated using simple majority voting and generative modeling to get consensus labels. The resulting probabilistic labels are used as weak signals to train a weakly supervised (WS) discriminative model and observe performance changes. We explore mistakes in the EBM-PICO that could have led to inaccurate evaluation of previous automation methods. Results In total, 4081 randomized clinical trials were weakly labeled to train the WS models and compared against full supervision. The models were separately trained for PICO entities and evaluated on the EBM-PICO test set. A WS approach combining ontologies and expert-generated rules outperformed full supervision for the participant entity by 1.71% macro-F1. Error analysis on the EBM-PICO subset revealed 18–23% erroneous token classifications. Discussion Automatic PICO entity extraction accelerates the writing of clinical systematic reviews that commonly use PICO information to filter health evidence. However, PICO extends to more entities—PICOS (S—study type and design), PICOC (C—context), and PICOT (T—timeframe) for which labelled datasets are unavailable. In such cases, the ability to use weak supervision overcomes the expensive annotation bottleneck. Conclusions We show the feasibility of WS PICO entity extraction using freely available ontologies and heuristics without manually annotated data. Weak supervision has encouraging performance compared to full supervision but requires careful design to outperform it.

show abstract

Section: Methodsmentioning

confidence: 99%

“…One of these approaches only explores distant supervision for intervention extraction using a single labelling source. 22 The other approach studies weak supervision for PICO span extraction but still utilizes some supervised annotation signals about whether a sentence includes PICO information. 23 …”

Section: Introductionmentioning

confidence: 99%