Integrating Probability and Nonprobability Samples for Survey Inference

Wiśniowski, Arkadiusz; Sakshaug, Joseph W.; Ruiz, Diego Andrés Pérez; Blom, Annelies G.

doi:10.1093/jssam/smz051

Cited by 48 publications

(34 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…More generally, there is an extensive literature on approaches for making inferences from data collected from nonprobability samples 50 – 52 . Other promising approaches include integrating surveys of varying quality 53 , 54 , and leveraging the estimated ddc in one outcome to correct bias in others under several scenarios ( Supplementary Information D ).…”

Section: Discussionmentioning

confidence: 99%

Unrepresentative big surveys significantly overestimated US vaccine uptake

Bradley

Kuriwaki

Isakov

et al. 2021

Nature

182

184

View full text Add to dashboard Cite

Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox 1 . Here we demonstrate this paradox in estimates of first-dose COVID-19 vaccine uptake in US adults from 9 January to 19 May 2021 from two large surveys: Delphi–Facebook 2 , 3 (about 250,000 responses per week) and Census Household Pulse 4 (about 75,000 every two weeks). In May 2021, Delphi–Facebook overestimated uptake by 17 percentage points (14–20 percentage points with 5% benchmark imprecision) and Census Household Pulse by 14 (11–17 percentage points with 5% benchmark imprecision), compared to a retroactively updated benchmark the Centers for Disease Control and Prevention published on 26 May 2021. Moreover, their large sample sizes led to miniscule margins of error on the incorrect estimates. By contrast, an Axios–Ipsos online panel 5 with about 1,000 responses per week following survey research best practices 6 provided reliable estimates and uncertainty quantification. We decompose observed error using a recent analytic framework 1 to explain the inaccuracy in the three surveys. We then analyse the implications for vaccine hesitancy and willingness. We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters more than data quantity, and that compensating the former with the latter is a mathematically provable losing proposition.

show abstract

Section: Discussionmentioning

confidence: 99%

Unrepresentative big surveys significantly overestimated US vaccine uptake

Bradley

Kuriwaki

Isakov

et al. 2021

Nature

182

184

View full text Add to dashboard Cite

show abstract

“…To address the limitations of non-probability sampling, we applied innovative strategies for sensitivity analyses to strengthen conclusions. 16 17 Additionally, we adjusted for non-response in both samples by using sample weights based on several sociodemographic characteristics (ie, sex, race/ethnicity, age and educational attainment), a standard procedure for addressing non-response in surveys.…”

Section: Discussionmentioning

confidence: 99%

“…In a sensitivity analysis, we conducted linear regression modelling using Bayesian data integration with responses from the RDD and online samples. 16 17 We retained the five-level response options for each of the three variables measuring perceptions about COVID-19 for these analyses. The Bayesian framework is well suited for integrating multiple data sources of varying quality, such as probability and non-probability samples.…”

Section: Methodsmentioning

confidence: 99%

“…In this article, we report conjugate difference specification, as it has been shown to have superior properties in simulation studies even in the presence of large selection biases in non-probability samples and in other real-world applications. 16 We used a linear regression model to estimate the association between having a household gun (vs not) with perceptions about COVID-19. To ensure comparability, linear regression models controlled for the same set of demographic factors used in the logistic regression models described above.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Differences in beliefs about COVID-19 by gun ownership: a cross-sectional survey of Texas adults

et al. 2021

Self Cite

View full text Add to dashboard Cite

ObjectivesWe investigated the association between gun ownership and perceptions about COVID-19 among Texas adults as the pandemic emerged. We considered perceived likelihood that the pandemic would lead to civil unrest, perceived importance of taking precautions to prevent transmission and perceptions that the threat of COVID-19 has been exaggerated.MethodsData were collected from 5 to 12 April 2020, shortly after Texas’ stay-at-home declaration. We generated a sample using random digit dial methods for a telephone survey (n=77, response rate=8%) and by randomly selecting adults from an ongoing panel to complete the survey online (n=1120, non-probability sample). We conducted a logistic regression to estimate differences in perceptions by gun ownership. To account for bias associated with use of a non-probability sample, we used Bayesian data integration and ran linear regression models to produce more accurate measures of association.ResultsAmong the 60% of Texas adults who reported gun ownership, estimates of past 7-day gun purchases, ammunition purchases and gun carrying were 15% (n=78), 20% (n=100) and 24% (n=130), respectively. We found no evidence of an association between gun ownership with perceived importance of taking precautions to prevent transmission or with perceived likelihood of civil unrest. Results from the logistic regression (OR 1.27, 95% CI 0.99 to 1.63) and the linear regression (β=0.18, 95% CI 0.07 to 0.29) suggest that gun owners may be more likely to believe the threat of COVID-19 was exaggerated.ConclusionsCompared with those without guns, gun owners may have been inclined to downplay the threat of COVID-19 early in the pandemic.

show abstract

“…For our 𝑆 1 data, 𝜌 ̂𝑟𝑦 =0.006, which means that it has a large selection bias or defect (Meng, 2018) especially if the sample size is large; in our case, it is just about 1500. Survey organizations are trying to move away from probability sampling to reduce high cost (Sakshaug et al 2019 andWisniowski et al 2020). Instead, they use nonprobability sample (for example, web samples) which is less costly and easily available, but possibly brings in biases into the sample.…”

Section: Introductionmentioning

confidence: 99%

Integration of Nonprobability and Probability Samples via Survey Weights

Nandram¹,

Choi²,

Liu³

2021

IJSP

View full text Add to dashboard Cite

Probability sample encounters the problems of increasing cost and nonresponse. The cost has rapidly been increasing in executing a large probability sample survey, and, for some surveys, response rate can be below the 10 percent level. Therefore, statisticians seek some alternative methods. One of them is to use a large nonprobability sample (S_1 ) supplemented by a small probability sample (S_2 ). Both samples are taken from the same population and they include common covariates, and a third sample (S_3 ) is created by combining these two samples; S_1  can be biased and S_2  may have large sample variance. These two problems are reduced by survey weights and combining the two samples. Although S_2  is a small sample, it provides good properties of unbiasedness in estimation and of survey weights. With these known weights, we obtain adjusted sample weights (ASW), and create a sample model from a finite population model. We fit the sample model to obtain its parameters and generate values from the population model. Similarly, we repeat these processes for other two samples, S_1  and S_3  and for different statistical methods. We show reduced biases of the finite population means and reduced variances.as the combined sample size becomes large. We analyze sample data to show the reduction of these two errors.

show abstract

Integrating Probability and Nonprobability Samples for Survey Inference

Cited by 48 publications

References 20 publications

Unrepresentative big surveys significantly overestimated US vaccine uptake

Unrepresentative big surveys significantly overestimated US vaccine uptake

Differences in beliefs about COVID-19 by gun ownership: a cross-sectional survey of Texas adults

Integration of Nonprobability and Probability Samples via Survey Weights

Contact Info

Product

Resources

About