BackgroundConducting surveys in low- and middle-income countries is often challenging because many areas lack a complete sampling frame, have outdated census information, or have limited data available for designing and selecting a representative sample. Geosampling is a probability-based, gridded population sampling method that addresses some of these issues by using geographic information system (GIS) tools to create logistically manageable area units for sampling. GIS grid cells are overlaid to partition a country’s existing administrative boundaries into area units that vary in size from 50 m × 50 m to 150 m × 150 m. To avoid sending interviewers to unoccupied areas, researchers manually classify grid cells as “residential” or “nonresidential” through visual inspection of aerial images. “Nonresidential” units are then excluded from sampling and data collection. This process of manually classifying sampling units has drawbacks since it is labor intensive, prone to human error, and creates the need for simplifying assumptions during calculation of design-based sampling weights. In this paper, we discuss the development of a deep learning classification model to predict whether aerial images are residential or nonresidential, thus reducing manual labor and eliminating the need for simplifying assumptions.ResultsOn our test sets, the model performs comparable to a human-level baseline in both Nigeria (94.5% accuracy) and Guatemala (96.4% accuracy), and outperforms baseline machine learning models trained on crowdsourced or remote-sensed geospatial features. Additionally, our findings suggest that this approach can work well in new areas with relatively modest amounts of training data.ConclusionsGridded population sampling methods like geosampling are becoming increasingly popular in countries with outdated or inaccurate census data because of their timeliness, flexibility, and cost. Using deep learning models directly on satellite images, we provide a novel method for sample frame construction that identifies residential gridded aerial units. In cases where manual classification of satellite images is used to (1) correct for errors in gridded population data sets or (2) classify grids where population estimates are unavailable, this methodology can help reduce annotation burden with comparable quality to human analysts.
As survey methods evolve, researchers require a comprehensive understanding of the error sources in their data. Comparative studies, which assess differences between the estimates from emerging survey methods and those from traditional surveys, are a popular tool for evaluating total error; however, they do not provide insight on the contributing error sources themselves. The Total Survey Error (TSE) framework is a natural fit for evaluations that examine survey error components across multiple data sources. In this article, we present a case study that demonstrates how the TSE framework can support both qualitative and quantitative evaluations comparing probability and nonprobability surveys. Our case study focuses on five internet panels that are intended to represent the US population and are used to measure health statistics. For these panels, we analyze the total survey error in two ways: (1) using a qualitative assessment that describes how panel construction and management methods may introduce error and (2) using a quantitative assessment that estimates and partitions the total error for two probability-based panels into coverage error and nonresponse error. This work can serve as a “proof of concept” for how the TSE framework may be applied to understand and compare the error structure of probability and nonprobability surveys. For those working specifically with internet panels, our findings will further provide an example of how researchers may choose the panel option best suited to their study aims and help vendors prioritize areas of improvement.
While governments, researchers, and NGOs are exploring ways to leverage big data sources for sustainable development, household surveys are still a critical source of information for dozens of the 232 indicators for the Sustainable Development Goals (SDGs) in low- and middle-income countries (LMICs). Though some countries’ statistical agencies maintain databases of persons or households for sampling, conducting household surveys in LMICs is complicated due to incomplete, outdated, or inaccurate sampling frames. As a means to develop or update household listings in LMICs, this paper explores the use of machine learning models to detect and enumerate building structures directly from satellite imagery in the Kaduna state of Nigeria. Specifically, an object detection model was used to identify and locate buildings in satellite images. In the test set, the model attained a mean average precision (mAP) of 0.48 for detecting structures, with relatively higher values in areas with lower building density (mAP = 0.65). Furthermore, when model predictions were compared against recent household listings from fieldwork in Nigeria, the predictions showed high correlation with household coverage (Pearson = 0.70; Spearman = 0.81). With the need to produce comparable, scalable SDG indicators, this case study explores the feasibility and challenges of using object detection models to help develop timely enumerated household lists in LMICs.
Background Infant and young child feeding (IYCF) practices are important for child survival and healthy growth, but IYCF practices remain suboptimal in Nigeria. The objective of this study was to measure the impact of Alive & Thrive’s IYCF social and behavior change communication intervention on early initiation of breastfeeding, exclusive breastfeeding, and minimum dietary diversity in Kaduna and Lagos States. Methods Local government areas were randomly allocated to intervention or comparison. Cross-sectional surveys of households with children aged 0–23 months were conducted [N = 6,266 baseline (2017), N = 7,320 endline (2020)]. Logistic regression was used to calculate difference-in-differences estimates (DDEs) of impact on IYCF practices and to assess within group changes from baseline to endline. Associations between intervention exposures and IYCF practices were tested in both study groups combined. Results In Kaduna, a positive differential effect of the intervention was found for exclusive breastfeeding (adjusted DDE 8.9 pp, P<0.099). Increases in both study groups from baseline to endline were observed in Kaduna for early initiation of breastfeeding (intervention 12.2 pp, P = 0.010; comparison 6.4 pp, P = 0.118) and minimum dietary diversity (intervention 20.0 pp, P<0.001; comparison 19.7 pp, P<0.001), which eliminated differential effects. In Lagos, no differential intervention impacts were found on IYCF practices because changes in early initiation of breastfeeding from baseline to endline were small in both study groups and increases in both study groups from baseline to endline were observed for exclusive breastfeeding (intervention 8.9 pp, P = 0.05; comparison 6.6 pp, P<0.001) and minimum dietary diversity (intervention 18.9 pp, P<0.001; comparison 24.3 pp, P<0.001). Odds of all three IYCF practices increased with exposure to facility-based interpersonal communication in both states and with community mobilization or mass media exposure in Kaduna. Conclusions This evaluation found weak impacts of the Alive & Thrive intervention on IYCF practices in the difference-in-differences analysis because of suspected intervention spillover to the comparison group. Substantial within group increases in IYCF practices from baseline to endline are likely attributable to the intervention, which was the major IYCF promotion activity in both states. This is supported by the association between intervention exposures and IYCF practices. Trial registration The study was registered with clinicaltrials.gov (NCT02975063).
Context: As response rates to health surveys conducted by telephone continue to decline and costs continue to increase, practitioners are increasingly considering a transition to self-administered mail contact modes. Objective: To compare empirical differences observed across adjacent administrations of the Healthy Chicago Survey (HCS) conducted by telephone versus self-administered via mail contact. Design: Data from the 2016, 2018, and 2020 administrations of the HCS are contrasted, and demographic distributions are benchmarked against the American Community Survey to investigate differences that may be linked to the HCS' transition from a telephone to self-administered mail mode between 2018 and 2020. Setting: All survey data were collected from adult residents of Chicago, Illinois, between 2016 and 2020. Main Outcome Measures: Costs, response rates, key health statistics, demographic distributions, and measures of precision generated from the HCS. Results: The mail mode led to a response rate increase of 6.8% to 38.2% at half the cost per complete. Mail respondents are more likely to be nonminority, female, and hold a college degree. Key health statistic differences are mixed, but design effects are larger in the mail mode, which we attribute to more detailed geographic stratification and weighting employed in 2020. Conclusions: The mail mode is a less costly data collection strategy for the HCS, but it comes with trade-offs. The quasi-random selection of an individual in the household exacerbates sociodemographic distribution disparities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.