2021
DOI: 10.1038/s41746-021-00518-0
|View full text |Cite
|
Sign up to set email alerts
|

Imputation of missing values for electronic health record laboratory data

Abstract: Laboratory data from Electronic Health Records (EHR) are often used in prediction models where estimation bias and model performance from missingness can be mitigated using imputation methods. We demonstrate the utility of imputation in two real-world EHR-derived cohorts of ischemic stroke from Geisinger and of heart failure from Sutter Health to: (1) characterize the patterns of missingness in laboratory variables; (2) simulate two missing mechanisms, arbitrary and monotone; (3) compare cross-sectional and mu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 55 publications
(36 citation statements)
references
References 42 publications
0
27
0
Order By: Relevance
“…Limitations: (1) Single healthcare system cohort with one ethnic background; (2) Limited sample size (lack of power) for a prediction study in subgroups; and (3) Challenges to the survival analysis using EHR data which are often high-dimensional, censored, have high and not-completely-at-random missingness, and low prevalence for the outcome of interest 20 .…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Limitations: (1) Single healthcare system cohort with one ethnic background; (2) Limited sample size (lack of power) for a prediction study in subgroups; and (3) Challenges to the survival analysis using EHR data which are often high-dimensional, censored, have high and not-completely-at-random missingness, and low prevalence for the outcome of interest 20 .…”
Section: Discussionmentioning
confidence: 99%
“…For the self-reported variables (such as alcohol, smoking) the missing value was replaced by zero. The BMI and systolic and diastolic blood pressures were imputed by MICE 2lpan, an appropriate strategy as we previously demonstrated 20 . No imputation was conducted for NIHSS (missingness at 37.8%) as there is no consensus strategy to impute this variable.…”
Section: Methodsmentioning
confidence: 99%
“…Currently, the limitations in AI-based models are mostly centered on the lack of sufficient patient representation, balanced cohorts, and biases introduced by cohort definitions or selection of variables, as well as the exclusion of a certain group of patients. Machine learning models pick up biases from the training datasets; therefore, to reach new heights, it is of fundamental importance to increase patient representation and data density and improve data for downstream modeling [ 154 , 155 ]. Finally, in terms of methodologies, both fields are taking advantage of advances in machine learning frameworks and tools.…”
Section: Discussionmentioning
confidence: 99%
“…Imputation has remarkably improved the statistical power of genome-wide association studies to identify novel genetic risk loci, and is facilitated by large reference datasets with deep genotypic coverage such as 1000 Genomes 149 , the UK10K 150 , the Haplotype reference consortium 151 and, recently, TOPMed 89 . Beyond genomics, imputation has also demonstrated utility for other types of medical data 152 . Different strategies have been suggested to make fewer assumptions.…”
Section: Masked (And Shifted) Targetmentioning
confidence: 99%