Objectives: To generate a synthetic sample of 1 million individuals that reflect the characteristics of the population recorded in the Health Survey for England (HSE). MethOds: We used data from the HSE to determine the age and genderdependent distributions of continuous variable risk factors (height, weight, BMI, systolic blood pressure, total and HDL cholesterol and their ratio, number of cigarettes/ day and units of alcohol/week) and prevalence of binary risk factors (smoking status, diabetes). Spearman rank correlations including age and gender were determined for these risk factors. A table of normally distributed random numbers was generated. Cholesky decomposition was used to replicate the observed Spearman rank correlations in the table of random numbers. Rank correlations that included binary variables were recalibrated to adjust for numerous tied values. The sample was then generated using a reverse look-up of the gamma distribution value using the random percentiles for continuous variables or setting a binary variable to 1 when the random percentile falls below the prevalence threshold. Results: Differences between coefficients were no more than 0.5% for any continuous variable. The prevalence of binary factors in the SS was very well matched with the HSE sample. Smoker incidence rates were 18.8% and 16.7% in the SS versus 18.4% and 16.5% in the HSE sample, for males and females respectively. Prevalence of diabetes in the SS was 13.3% and 7.7% versus 13.2% and 7.8%, and for cardiovascular disease was 17.6% and 14.1% versus 18.2% and 14.6%. Comparing 25th, 50th and 75th quantiles, the maximum difference between the original and synthetic values for BMI and TC/HDL ratio were 0.6Kg and 0.3 respectively. cOnclusiOns: Our new approach generates large synthetic samples with risk factor distributions very closely matching those of the real HSE population. This sample can be used to model the likely impact of new therapies or predict mortality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.