This article illustrates some effects of dynamic adaptive design in a large government survey. We present findings from the 2015 National Survey of College Graduates Adaptive Design Experiment, including results and discussion of sample representativeness, response rates, and cost. We also consider the effect of truncating data collection (examining alternative stopping rules) on these metrics. In this experiment, we monitored sample representativeness continuously and altered data collection procedures—increasing or decreasing contact effort—to improve it. Cases that were overrepresented in the achieved sample were assigned to more passive modes of data collection (web or paper) or withheld from the group of cases that received survey reminders, whereas underrepresented cases were assigned to telephone follow-ups. The findings suggest that a dynamic adaptive survey design can improve a data quality indicator (R-indicators) without increasing cost or reducing response rate. We also find that a dynamic adaptive survey design has the potential to reduce the length of the data collection period, control cost, and increase timeliness of data delivery, if sample representativeness is prioritized over increasing the survey response rate.
Responsive survey designs rely upon incoming data from the field data collection to optimize cost and quality tradeoffs. In order to make these decisions in real-time, survey managers rely upon monitoring tools that generate proxy indicators for cost and quality. There is a developing literature on proxy indicators for the risk of nonresponse bias. However, there is very little research on proxy indicators for costs and almost none aimed at predicting costs under alternative design strategies. Predictions of survey costs and proxy error indicators can be used to optimize survey designs in real time. Using data from the National Survey of Family Growth, we evaluate alternative modeling strategies aimed at predicting survey costs (specifically, interviewer hours). The models include multilevel regression (with random interviewer effects) and Bayesian Additive Regression Trees (BART).
High-quality survey data collection is getting more expensive to conduct because of decreasing response rates and rising data collection costs. Responsive and adaptive designs have emerged as a framework for targeting and reallocating resources during the data collection period to improve survey data collection efficiency. Here, we report on the implementation and evaluation of a responsive design experiment in the National Survey of College Graduates that optimizes the cost-quality tradeoff by minimizing a function of data collection costs and the root mean squared error of a key survey measure, self-reported salary. We used a Bayesian framework to incorporate prior information and generate predictions of estimated response propensity, self-reported salary, and data collection costs for use in our optimization rule. At three points during the data collection process, we implement the optimization rule and identify cases for which reduced effort would have minimal effect on the mean squared error (RMSE) of mean self-reported salary while allowing us to reduce data collection costs. We find that this optimization process allowed us to reduce data collection costs by nearly 10 percent, without a statistically or practically significant increase in the RMSE of mean salary or a decrease in the unweighted response rate. This experiment demonstrates the potential for these types of designs to more effectively target data collection resources to reach survey quality goals.
Responsive survey design (RSD) aims to increase the efficiency of survey data collection via live monitoring of paradata and the introduction of protocol changes when survey errors and increased costs seem imminent. Daily predictions of response propensity for all active sampled cases are among the most important quantities for live monitoring of data collection outcomes, making sound predictions of these propensities essential for the success of RSD. Because it relies on real-time updates of prior beliefs about key design quantities, such as predicted response propensities, RSD stands to benefit from Bayesian approaches. However, empirical evidence of the merits of these approaches is lacking in the literature, and the derivation of informative prior distributions is required for these approaches to be effective. In this paper, we evaluate the ability of two approaches to deriving prior distributions for the coefficients defining daily response propensity models to improve predictions of daily response propensity in a real data collection employing RSD. The first approach involves analyses of historical data from the same survey, and the second approach involves literature review. We find that Bayesian methods based on these two approaches result in higher-quality predictions of response propensity than more standard approaches ignoring prior information. This is especially true during the early-to-middle periods of data collection, when survey managers using RSD often consider interventions.
Surveys face difficult choices in managing cost-error trade-offs. Stopping rules for surveys have been proposed as a method for managing these trade-offs. A stopping rule will limit effort on a select subset of cases to reduce costs with minimal harm to quality. Previously proposed stopping rules have focused on quality with an implicit assumption that all cases have the same cost. This assumption is unlikely to be true, particularly when some cases will require more effort and, therefore, more costs than others. We propose a new rule that looks at both predicted costs and quality. This rule is tested experimentally against another rule that focuses on stopping cases that are expected to be difficult to recruit. The experiment was conducted on the 2020 data collection of the Health and Retirement Study (HRS). We test both Bayesian and non-Bayesian (maximum-likelihood or ML) versions of the rule. The Bayesian version of the prediction models uses historical data to establish prior information. The Bayesian version led to higher-quality data for roughly the same cost, while the ML version led to small reductions in quality with larger reductions in cost compared to the control rule.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.