2013
DOI: 10.1186/1471-2105-14-10
|View full text |Cite
|
Sign up to set email alerts
|

Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies

Abstract: BackgroundThe increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicia… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
89
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
6
2
1
1

Relationship

2
8

Authors

Journals

citations
Cited by 86 publications
(91 citation statements)
references
References 38 publications
2
89
0
Order By: Relevance
“…Recent research has shown that naïve EHR statistical analyses can lead to the reversals of cause and effect [16], induction of spurious signals [17], large errors when predicting optimal drug dosage [18], cancellation of temporal signals when aggregating different cohorts[19, 20, 21], and model distortion when not accounting for redundancy in the narrative part of the EHR [22]. …”
Section: Introductionmentioning
confidence: 99%
“…Recent research has shown that naïve EHR statistical analyses can lead to the reversals of cause and effect [16], induction of spurious signals [17], large errors when predicting optimal drug dosage [18], cancellation of temporal signals when aggregating different cohorts[19, 20, 21], and model distortion when not accounting for redundancy in the narrative part of the EHR [22]. …”
Section: Introductionmentioning
confidence: 99%
“…By a deep learning process [30] of the Transmissions file (left side of figure 2), using problems' list logic [31][32][33] and pattern matching with the SQL LIKE operator [34], twenty-four syndromes were implemented [35][36][37][38][39][40][41][42], following the Sursaud® SSS method [13].…”
Section: Ontologymentioning
confidence: 99%
“…In addition, EMRs typically stem from different physicians and laboratories. This results in large amounts of redundant information yet presented in different writing styles but without guarantee to be complete (Weiskopf et al, 2013;Cohen et al, 2013). Some of the EMRs may be composed out of free form written text whereas others contain dictated text, tables or a mixture of tables and text.…”
Section: Introductionmentioning
confidence: 99%