Despite interest in the built food environment, little is known about the validity of commonly used secondary data. The authors conducted a comprehensive field census identifying the locations of all food outlets using a handheld global positioning system in 8 counties in South Carolina (2008–2009). Secondary data were obtained from 2 commercial companies, Dun & Bradstreet, Inc. (D&B) (Short Hills, New Jersey) and InfoUSA, Inc. (Omaha, Nebraska), and the South Carolina Department of Health and Environmental Control (DHEC). Sensitivity, positive predictive value, and geospatial accuracy were compared. The field census identified 2,208 food outlets, significantly more than the DHEC (n = 1,694), InfoUSA (n = 1,657), or D&B (n = 1,573). Sensitivities were moderate for DHEC (68%) and InfoUSA (65%) and fair for D&B (55%). Combining InfoUSA and D&B data would have increased sensitivity to 78%. Positive predictive values were very good for DHEC (89%) and InfoUSA (86%) and good for D&B (78%). Geospatial accuracy varied, depending on the scale: More than 80% of outlets were geocoded to the correct US Census tract, but only 29%–39% were correctly allocated within 100 m. This study suggests that the validity of common data sources used to characterize the food environment is limited. The marked undercount of food outlets and the geospatial inaccuracies observed have the potential to introduce bias into studies evaluating the impact of the built food environment.
BackgroundIn many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods.Main textWe review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees.ConclusionsDecision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.Electronic supplementary materialThe online version of this article (doi:10.1186/s12982-017-0064-4) contains supplementary material, which is available to authorized users.
This trial did not demonstrate a significant effect of STYH participation on change in mean minutes of MVPA or mean BMI 12 months after classes ended, although there was a non-significant association with odds of reduction of BMI ( = 0.07). This study has implications for design of intervention studies in people with intellectual disability (ID).
Objective Commercial listings of food retail outlets are increasingly used by community members, food policy councils, and in multi-level intervention research to identify areas with limited access to healthier food. This study quantified the amount of count, type and geospatial error in two commercial data sources. Methods InfoUSA and Dun & Bradstreet (D&B) were compared to a validated field census and validity statistics calculated. Results Considering only completeness, D&B data undercounted 24% of existing supermarkets and grocery stores and InfoUSA 29%. Additionally, considering accuracy of outlet type assignment increased the undercount error to 42% and 39%, respectively. Marked overcount existed as well and only 43% of existing supermarkets were correctly identified with respect to presence, outlet type, and location. Conclusions and Implications Relying exclusively on secondary data to characterize the food environment will result in substantial error. While extensive data cleaning can offset some error, verification of outlets with a field census is still the method of choice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.