2021
DOI: 10.1111/rssa.12689
|View full text |Cite
|
Sign up to set email alerts
|

Two-Phase Sampling Designs for Data Validation in Settings with Covariate Measurement Error and Continuous Outcome

Abstract: Measurement errors are present in many data collection procedures and can harm analyses by biasing estimates. To correct for measurement error, researchers often validate a subsample of records and then incorporate the information learned from this validation sample into estimation. In practice, the validation sample is often selected using simple random sampling (SRS). However, SRS leads to inefficient estimates because it ignores information on the error-prone variables, which can be highly correlated to the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
3

Relationship

7
2

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 60 publications
0
9
0
Order By: Relevance
“…Given a set of strata, Neyman allocation samples proportional to the number of observations in the strata times the standard deviation of the variable of interest in the strata. Since the log HR estimator from the Cox model is asymptotically equivalent to the sum of influence functions, Neyman allocation in our setting is to sample proportional to the product of the number of records in a stratum times the standard deviation of the influence function for the target coefficient in that stratum (Amorim et al., 2021). Again, we do not know the true influence function, but we can estimate it from phase 1 data, and as we collect phase 2 data, we can update estimates of it and adjust our sampling accordingly.…”
Section: Multiwave Phase 2 Validation Designmentioning
confidence: 99%
“…Given a set of strata, Neyman allocation samples proportional to the number of observations in the strata times the standard deviation of the variable of interest in the strata. Since the log HR estimator from the Cox model is asymptotically equivalent to the sum of influence functions, Neyman allocation in our setting is to sample proportional to the product of the number of records in a stratum times the standard deviation of the influence function for the target coefficient in that stratum (Amorim et al., 2021). Again, we do not know the true influence function, but we can estimate it from phase 1 data, and as we collect phase 2 data, we can update estimates of it and adjust our sampling accordingly.…”
Section: Multiwave Phase 2 Validation Designmentioning
confidence: 99%
“…This result does not imply that the worst-case misspecification will be plausible, but the analyses of Han et al (2021) provide examples where it is, and Breslow et al (2013) give an illustration in data from the Women's Health Initiative. Amorim et al (2021) compared several model-based and design-based strategies and found that they all gave improvements over simple random sampling, and argued that the ideal choice of design and analysis depended on context.…”
Section: Model-based Analysis: the Efficiency Gapmentioning
confidence: 99%
“…In practice however, research interest often lies in the relationship between covariates and an outcome of interest, which is explored through regression modelling. Chen and Lumley (2020) and Amorim et al (2021) point out that Neyman and Wright optimum allocation strategies are still useful in these cases because regression parameters can be considered as the sum of their influence functions. Thus, they show that the allocation of samples to strata that minimizes the variance of the estimate of the sum of influence functions leads to the optimal sampling design, which can be written as:…”
Section: Optimum Allocation For Regression Parametersmentioning
confidence: 99%
“…In stratified sampling, efficiency is gained when strata cut points are chosen to minimize within-stratum variances and maximize the variance across strata. In some cases, it is also desirable for each of the strata to have similar sample sizes (Amorim et al, 2021). Determining the split points on which to define strata also typically relies on unknown parameters, but estimates for the within-stratum variances can be obtained through an auxiliary variable correlated with the variable of interest or from previous sampling waves in which the variable of interest was collected.…”
Section: Allocating Samples In Wavesmentioning
confidence: 99%