2013
DOI: 10.13063/2327-9214.1035
|View full text |Cite
|
Sign up to set email alerts
|

Strategies for Handling Missing Data in Electronic Health Record Derived Data

Abstract: Electronic health records (EHRs) present a wealth of data that are vital for improving patient-centered outcomes, although the data can present significant statistical challenges. In particular, EHR data contains substantial missing information that if left unaddressed could reduce the validity of conclusions drawn. Properly addressing the missing data issue in EHR data is complicated by the fact that it is sometimes difficult to differentiate between missing data and a negative value. For example, a patient w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
206
0
1

Year Published

2016
2016
2023
2023

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 268 publications
(238 citation statements)
references
References 28 publications
3
206
0
1
Order By: Relevance
“…Unfortunately, in contrast to confounding bias,11–21 the control of selection bias in EHR-based settings has received virtually no attention in the literature. This may be due, in part, to the notion that selection bias can be cast as a missing data problem and that statistical methods for missing data are well established22,23 and can be readily applied to EHR-based CER 24…”
Section: Introductionmentioning
confidence: 99%
“…Unfortunately, in contrast to confounding bias,11–21 the control of selection bias in EHR-based settings has received virtually no attention in the literature. This may be due, in part, to the notion that selection bias can be cast as a missing data problem and that statistical methods for missing data are well established22,23 and can be readily applied to EHR-based CER 24…”
Section: Introductionmentioning
confidence: 99%
“…We compared imputation using the CMM to population mean imputation (as a baseline), multivariate imputation using chained equations (MICE) [6,38], and k -nearest neighbors imputation. For our purposes, we set the prediction method of MICE to predictive mean matching [6] for non-categorical variables and logistic/polytomous regression for categorical variables.…”
Section: Methodsmentioning
confidence: 99%
“…Also, the missing pattern of time series data may also contain information that could improve the performance of model prediction. The other option is to fix the missing values by resampling or interpolation, but these methods may require knowledge of the whole dataset before dealing with missing data, and may result in a two-staged modelling process (Wells et al, 2013). Recent works tried to model explicitly the missingness of various datasets (Wu et al, 2015), or interpolate according to the time series information of missing data in health care dataset (Che et al, 2016).…”
Section: Fixing Missing Valuesmentioning
confidence: 99%