2019
DOI: 10.1002/cjs.11513
|View full text |Cite|
|
Sign up to set email alerts
|

Synthetic data method to incorporate external information into a current study

Abstract: We consider the situation where there is a known regression model that can be used to predict an outcome, Y, from a set of predictor variables X. A new variable B is expected to enhance the prediction of Y. A dataset of size n containing Y,X and B is available, and the challenge is to build an improved model for Y|X,B that uses both the available individual level data and some summary information obtained from the known model for Y|X. We propose a synthetic data approach, which consists of creating m additiona… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
23
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 14 publications
(23 citation statements)
references
References 16 publications
(18 reference statements)
0
23
0
Order By: Relevance
“…As shown in Figure 3c, we show the result of the Louis information estimator (StackImpute-Louis), one of the three variance estimators proposed by Beesley and Taylor (2020) as they always have similar performances. Gu et al (2019) has shown that when the synthetic data size goes to infinity, the precision gain we achieve in X covariates will converge to a constant, which is shown as the gradually stable trend of the grey curve (Monte Carlo empirical variance of the point estimates, also serves as the empirical truth here). When the synthetic data size increases from one times the internal data size to 10 times for each external study (i.e., total missing rate increases from 66.6% to 95%), the StackImpute variance estimator and Rubin's rule variance continuously underestimate the empirical truth.…”
Section: Simulation Resultsmentioning
confidence: 94%
See 3 more Smart Citations
“…As shown in Figure 3c, we show the result of the Louis information estimator (StackImpute-Louis), one of the three variance estimators proposed by Beesley and Taylor (2020) as they always have similar performances. Gu et al (2019) has shown that when the synthetic data size goes to infinity, the precision gain we achieve in X covariates will converge to a constant, which is shown as the gradually stable trend of the grey curve (Monte Carlo empirical variance of the point estimates, also serves as the empirical truth here). When the synthetic data size increases from one times the internal data size to 10 times for each external study (i.e., total missing rate increases from 66.6% to 95%), the StackImpute variance estimator and Rubin's rule variance continuously underestimate the empirical truth.…”
Section: Simulation Resultsmentioning
confidence: 94%
“…Step 1: Convert each external summary-level information into a set of synthetic data according to Gu et al (2019) and append each of the synthetic data sets to the internal data, from which we create a longer dataset as illustrated in Figure 1. The synthetic data for external study k constitutes of observed X k and the simulated value of Y. Unmeasured variables in the external populations (all B and some X's) will be treated as missing data.…”
Section: Proposed Data Integration and Analysis Strategymentioning
confidence: 99%
See 2 more Smart Citations
“…In presence of this type of auxiliary data, methods based on the generalized method of moment, generalized regression, weight calibration, constrained maximum likelihood, empirical likelihood, etc., have been proposed to borrow auxiliary information to power up the main study. [1][2][3][4][5][6][7][8][9][10][11][12][13] In this article, we consider a different type of auxiliary data that is also widely seen in applications, that is, an auxiliary measurement collected in the same study but served as the outcome in a secondary analysis. Usually, such kind of auxiliary measurement is highly associated with the primary outcome, and how to incorporate this secondary information to enhance estimation precision for the main analysis is of high interest.…”
Section: Introductionmentioning
confidence: 99%