2020
DOI: 10.1038/s41467-020-19270-2
|View full text |Cite
|
Sign up to set email alerts
|

Multiple imputation for analysis of incomplete data in distributed health data networks

Abstract: Distributed health data networks (DHDNs) leverage data from multiple sources or sites such as electronic health records (EHRs) from multiple healthcare systems and have drawn increasing interests in recent years, as they do not require sharing of subject-level data and hence lower the hurdles for collaboration between institutions considerably. However, DHDNs face a number of challenges in data analysis, particularly in the presence of missing data. The current state-of-the-art methods for handling incomplete … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
2
1

Relationship

2
8

Authors

Journals

citations
Cited by 33 publications
(26 citation statements)
references
References 24 publications
0
26
0
Order By: Relevance
“…In addition, missing data present some unique challenges in a distributed data setting, while some privacy-preserving missing data methods have been developed, this remains a nascent area of research. 59,60 Furthermore, sharing of genetic data and biobank data are important to facilitate largescale genetic studies, but special attention is needed to protect patient privacy and confidentiality.…”
Section: Data Privacy (Yong Chen)mentioning
confidence: 99%
“…In addition, missing data present some unique challenges in a distributed data setting, while some privacy-preserving missing data methods have been developed, this remains a nascent area of research. 59,60 Furthermore, sharing of genetic data and biobank data are important to facilitate largescale genetic studies, but special attention is needed to protect patient privacy and confidentiality.…”
Section: Data Privacy (Yong Chen)mentioning
confidence: 99%
“…Analyses under the scenario where some centers do not collect data on specific covariates may be challenging. For example, multiple imputation (which has seen data‐privacy adaptions 40 ) would require considerable coordination between centers if a variable is entirely unavailable at one site, and the imputation model must be developed at another. And third, even if this is statistically feasible, it may not be practical to use these approaches in a multi‐stage setting, as it would require going back and forth to each center for the different steps that need to be performed at each iteration (creation of the matrix data, estimating the optimal rule, computation of the pseudo‐outcome, etc.…”
Section: Discussionmentioning
confidence: 99%
“…REMARK 3: Given the pairwise conditioning technique used in the proposed algorithm, the proposed dCLR algorithm can handle the missingness in the data, especially some missing not at random mechanisms as outlined in our earlier investigation; see Chen et al 2015 46 ; and also see Chan 2013 47 , and Ning et al 2017 39 . If the data are missing at random, imputation methods such as inverse probability weighting and imputation 48 can be considered before implementing the dCLR algorithm.…”
Section: Methodsmentioning
confidence: 99%