2017
DOI: 10.1145/3106235
|View full text |Cite
|
Sign up to set email alerts
|

On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports

Abstract: In the last five years there has been a flurry of work on information extraction from clinical documents, that is, on algorithms capable of extracting, from the informal and unstructured texts that are generated during everyday clinical practice, mentions of concepts relevant to such practice. Many of these research works are about methods based on supervised learning, that is, methods for training an information extraction system from manually annotated examples. While a lot of work has been devoted to devisi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 50 publications
(31 reference statements)
0
2
0
Order By: Relevance
“…In 2017, two Italian researchers, Marcheggiania and Sebastiani (44), devoted their studies to investigate the training data-quality effects on the learning process for the clinical domain. In particular, they focused on information extraction systems.…”
Section: Nlp Applications In Clinical Contextmentioning
confidence: 99%
“…In 2017, two Italian researchers, Marcheggiania and Sebastiani (44), devoted their studies to investigate the training data-quality effects on the learning process for the clinical domain. In particular, they focused on information extraction systems.…”
Section: Nlp Applications In Clinical Contextmentioning
confidence: 99%
“…This is more severe in enterprise domains where labels come from different persons at different locations and times who follow different rules. Recent research [18] studied the effect of low-quality training data on clinical reports, and demonstrated that the difficulty of acquiring high-quality data actually bottlenecks the wide adoption of the data-driven approach in the health domain. Indeed, how can we obtain super-human accuracy from noisy inaccurate data?…”
mentioning
confidence: 99%