A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation

Singh, Gaurav; Marshall, Iain J.; Thomas, James; Shawe‐Taylor, John; Wallace, Byron C.

doi:10.1145/3132847.3132989

Cited by 12 publications

(24 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…32 describes removing sentence headings in structured abstracts in order to avoid creating a system biased towards common terms, while 63 discussed abbreviations and grammar as factors influencing the results. Length of input text 34 and position of a sentence within a paragraph or abstract, e. g. up to 10% lower classification scores for certain sentence combinations in unstructured abstracts, were shown in several publications. 30,37,72 3.4.5.3 Is the process of avoiding overfitting or underfitting described?…”

Section: Are Explanations For the Influence Of Both Visible And Hidden Variables In The Dataset Given?mentioning

confidence: 97%

“…Most data extraction approaches focused on recognising instances of entity or sentence classes, and a small number of publications went one step further to normalise to actual concepts. 34,58 The 'Other' category includes some more detailed drug annotations 36 or information such as confounders 26 and other entity types (see the full dataset in Underlying data: Appendix A for more information 86 ).…”

Section: Data Extraction Targetsmentioning

confidence: 99%

See 1 more Smart Citation

Data extraction methods for systematic review (semi)automation: A living systematic review

et al. 2021

Self Cite

View full text Add to dashboard Cite

Background: The reliable and usable (semi)automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. Methods: We systematically and continually search MEDLINE, Institute of Electrical and Electronics Engineers (IEEE), arXiv, and the dblp computer science bibliography databases. Full text screening and data extraction are conducted within an open-source living systematic review application created for the purpose of this review. This iteration of the living review includes publications up to a cut-off date of 22 April 2020. Results: In total, 53 publications are included in this version of our review. Of these, 41 (77%) of the publications addressed extraction of data from abstracts, while 14 (26%) used full texts. A total of 48 (90%) publications developed and evaluated classifiers that used randomised controlled trials as the main target texts. Over 30 entities were extracted, with PICOs (population, intervention, comparator, outcome) being the most frequently extracted. A description of their datasets was provided by 49 publications (94%), but only seven (13%) made the data publicly available. Code was made available by 10 (19%) publications, and five (9%) implemented publicly available tools. Conclusions: This living systematic review presents an overview of (semi)automated data-extraction literature of interest to different types of systematic review. We identified a broad evidence base of publications describing data extraction for interventional reviews and a small number of publications extracting epidemiological or diagnostic accuracy data. The lack of publicly available gold-standard data for evaluation, and lack of application thereof, makes it difficult to draw conclusions on which is the best-performing system for each data extraction target. With this living review we aim to review the literature continually.

show abstract

Section: Are Explanations For the Influence Of Both Visible And Hidden Variables In The Dataset Given?mentioning

confidence: 97%

Section: Data Extraction Targetsmentioning

confidence: 99%

Data extraction methods for systematic review (semi)automation: A living systematic review

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…: {P, I, C, O, R} (Wallace, 2019). Consequently, collecting such explicit evidence is vital for further analyses, and is also the objective for most relevant works: Some seek to find relevant papers through retrieval (Lee and Sun, 2018); many works are aimed at extracting PICO elements from published literature (Wallace et al, 2016;Singh et al, 2017;Jin and Szolovits, 2018;Nye et al, 2018;Zhang et al, 2020); the evidence inference task extracts R for a given ICO query using the corresponding clinical trial report (Lehman et al, 2019;DeYoung et al, 2020). However, since getting expert annotations is expensive, these works are typically limited in scale, with only thousands of labeled instances.…”

Section: Related Workmentioning

confidence: 99%

“…One particular challenge of this task is that evidence is entangled with other free-texts in the literature. Prior works have explored explicit methods for evidence integration through a pipeline of retrieval, extraction and inference on structured {P,I,C,O,R} evidence (Wallace et al, 2016;Singh et al, 2017;Jin and Szolovits, 2018;Lee and Sun, 2018;Nye et al, 2018;Lehman et al, 2019;DeYoung et al, 2020;Zhang et al, 2020). However, they are limited in scale since getting domain-specific supervision for all clinical evidence is prohibitively expensive.…”

Section: Introductionmentioning

confidence: 99%

Predicting Clinical Trial Results by Implicit Evidence Integration

Jin

Tan

Chen

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Clinical trials provide essential guidance for practicing Evidence-Based Medicine, though often accompanying with unendurable costs and risks. To optimize the design of clinical trials, we introduce a novel Clinical Trial Result Prediction (CTRP) task. In the CTRP framework, a model takes a PICO-formatted clinical trial proposal with its background as input and predicts the result, i.e. how the Intervention group compares with the Comparison group in terms of the measured Outcome in the studied Population. While structured clinical evidence is prohibitively expensive for manual collection, we exploit large-scale unstructured sentences from medical literature that implicitly contain PICOs and results as evidence. Specifically, we pre-train a model to predict the disentangled results from such implicit evidence and fine-tune the model with limited data on the downstream datasets. Experiments on the benchmark Evidence Integration dataset show that the proposed model outperforms the baselines by large margins, e.g., with a 10.7% relative gain over BioBERT in macro-F1. Moreover, the performance improvement is also validated on another dataset composed of clinical trials related to COVID-19.

show abstract

“…Overall accuracy is approaching 50%, which may not yet be sufficient for global roll-out, but does represent good progress, considering how challenging this classification task is (bearing in mind there are hundreds of thousands of terms for the machine to learn from very little training data). [22] What are the data?…”

Section: What Type Of Study Is This?mentioning

confidence: 99%

Evidence surveillance to keep up to date with new research

Thomas¹,

Noel-Storr²,

McDonald³

2018

Systematic Searching

Self Cite

View full text Add to dashboard Cite

Overview of the topicResearch is being published at an ever-increasing rate and it is becoming more and more difficult for systematic reviewers to find research in a timely way and keep existing reviews updated as new studies are published.[1] This is a particular problem for organizations which maintain libraries of systematic reviews, such as the Cochrane and Campbell Collaborations, as the more systematic reviews they publish, the greater the burden of maintenance. It is also a challenge for guideline-producing organizations which, for pragmatic reasons, typically invest significant resources and effort in one-off periodic updates without knowing whether the evidence base has changed or has actually changed so rapidly that more frequent updating would have been warranted. Previous work has shown that systematic reviews can date very quicklywith some out of date as soon as they are published [2]and it is becoming clear that our current methods of research curation are wasteful of societal investment in research, and risk resulting in suboptimal outcomes. [3] This chapter is concerned with this problem of 'data deluge' and the need to maintain a better surveillance of research in order to keep abreast of new developments. It is thus related to work on Living Systematic Reviews (LSRs), that are 'continually updated, incorporating relevant new evidence as it becomes available'. [4,5] The chapter outlines developments in automation technologies that are already making the systematic review process more efficient and then focuses on the way that global research curation systems are organized. The chapter suggests that new approaches are needed in order to support the production of evidence syntheses in efficient and timely ways. Case studies 9.1 and 9.2 explain how these new developments are being put into practice to realise these benefits. 9.2Discussion New ways of working that integrate and capitalize on automation are necessary to tackle the growing burden of identifying and synthesizing research. New technologieswhich range from the mundane (such as identifying duplicates in bibliographic records) to full Artificial Intelligence (AI) systemsare under constant development and are already assisting various aspects of the evidence curation (see box) and discovery process. It is possible to break these new tools down into two broad categories:1. Tools which can make existing manual processes more efficient.2. Tools which aim to change the way systematic reviews are carried out in more fundamental ways by linking tools together and changing the sequencing of activities.The following section examines the potential for automation to assist in existing processes, and the section after that considers the potential for linking these tools into integrated surveillance systems for LSRs and other types of living evidence, such as guidelines.3 What is 'curation'?A key concept in this chapter is 'evidence curation'. 'Curation' as an idea has been around a long time, and concerns the activities necessary to manage, sort and ar...

show abstract

A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation

Cited by 12 publications

References 16 publications

Data extraction methods for systematic review (semi)automation: A living systematic review

Data extraction methods for systematic review (semi)automation: A living systematic review

Predicting Clinical Trial Results by Implicit Evidence Integration

Evidence surveillance to keep up to date with new research

Contact Info

Product

Resources

About