BackgroundMissing data are a common feature in many areas of research especially those involving survey data in biological, health and social sciences research. Most of the analyses of the survey data are done taking a complete-case approach, that is taking a list-wise deletion of all cases with missing values assuming that missing values are missing completely at random (MCAR). Methods that are based on substituting the missing values with single values such as the last value carried forward, the mean and regression predictions (single imputations) are also used. These methods often result in potential bias in estimates, in loss of statistical information and in loss of distributional relationships between variables. In addition, the strong MCAR assumption is not tenable in most practical instances.MethodsSince missing data are a major problem in HIV research, the current research seeks to illustrate and highlight the strength of multiple imputation procedure, as a method of handling missing data, which comes from its ability to draw multiple values for the missing observations from plausible predictive distributions for them. This is particularly important in HIV research in sub-Saharan Africa where accurate collection of (complete) data is still a challenge. Furthermore the multiple imputation accounts for the uncertainty introduced by the very process of imputing values for the missing observations. In particular national and subgroup estimates of HIV prevalence in Zimbabwe were computed using multiply imputed data sets from the 2010–11 Zimbabwe Demographic and Health Surveys (2010–11 ZDHS) data. A survey logistic regression model for HIV prevalence and demographic and socio-economic variables was used as the substantive analysis model. The results for both the complete-case analysis and the multiple imputation analysis are presented and discussed.ResultsAcross different subgroups of the population, the crude estimates of HIV prevalence are generally not identical but their variations are consistent between the two approaches (complete-case analysis and multiple imputation analysis). The estimates of standard errors under the multiple imputation are predominantly smaller, hence leading to narrower confidence intervals, than under the complete case analysis. Under the logistic regression adjusted odds ratios vary greatly between the two approaches. The model based confidence intervals for the adjusted odds ratios are wider under the multiple imputation which is indicative of the inclusion of a combined measure of the within and between imputation variability.ConclusionsThere is considerable variation between estimates obtained between the two approaches. The use of multiple imputations allows the uncertainty brought about by the imputation process to be measured. This consequently yields more reliable estimates of the parameters of interest and reduce the chances of declaring significant effects unnecessarily (type I error). In addition, the utilization of the powerful and flexible statistical computing packag...
Estimates of HIV prevalence computed using data obtained from sampling a subgroup of the national population may lack the representativeness of all the relevant domains of the population. These estimates are often computed on the assumption that HIV prevalence is uniform across all domains of the population. Use of appropriate statistical methods together with population-based survey data can enhance better estimation of national and subgroup level HIV prevalence and can provide improved explanations of the variation in HIV prevalence across different domains of the population. In this study we computed design-consistent estimates of HIV prevalence, and their respective 95% confidence intervals at both the national and subgroup levels. In addition, we provided a multivariable survey logistic regression model from a generalized linear modelling perspective for explaining the variation in HIV prevalence using demographic, socio-economic, socio-cultural and behavioural factors. Essentially, this study borrows from the proximate determinants conceptual framework which provides guiding principles upon which socio-economic and socio-cultural variables affect HIV prevalence through biological behavioural factors. We utilize the 2010–11 Zimbabwe Demographic and Health Survey (2010–11 ZDHS) data (which are population based) to estimate HIV prevalence in different categories of the population and for constructing the logistic regression model. It was established that HIV prevalence varies greatly with age, gender, marital status, place of residence, literacy level, belief on whether condom use can reduce the risk of contracting HIV and level of recent sexual activity whereas there was no marked variation in HIV prevalence with social status (measured using a wealth index), method of contraceptive and an individual’s level of education.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.