Innocent Ngaruye scite author profile

Aim HIV prevention measures in sub-Saharan Africa are still short of attaining the UNAIDS 90–90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well as predicting persons at high risk of the infection. Method We applied machine learning approaches for building models using population-based HIV Impact Assessment (PHIA) data for 41,939 male and 45,105 female respondents with 30 and 40 variables respectively from four countries in sub-Saharan countries. We trained and validated the algorithms on 80% of the data and tested on the remaining 20% where we rotated around the left-out country. An algorithm with the best mean f1 score was retained and trained on the most predictive variables. We used the model to identify people living with HIV and individuals with a higher likelihood of contracting the disease. Results Application of XGBoost algorithm appeared to significantly improve identification of HIV positivity over the other five algorithms by f1 scoring mean of 90% and 92% for males and females respectively. Amongst the eight most predictor features in both sexes were: age, relationship with family head, the highest level of education, highest grade at that school level, work for payment, avoiding pregnancy, age at the first experience of sex, and wealth quintile. Model performance using these variables increased significantly compared to having all the variables included. We identified five males and 19 females individuals that would require testing to find one HIV positive individual. We also predicted that 4·14% of males and 10.81% of females are at high risk of infection. Conclusion Our findings provide a potential use of the XGBoost algorithm with socio-behavioural-driven data at substantially identifying HIV predictors and predicting individuals at high risk of infection for targeted screening.

show abstract

Small area estimation under a multivariate linear model for repeated measures data

Ngaruye

Nzabanita

Rosen

et al. 2017

Communications in Statistics - Theory and Methods

View full text Add to dashboard Cite

In this article, Small Area Estimation under a Multivariate Linear model for repeated measures data is considered. The proposed model aims to get a model which borrows strength both across small areas and over time. The model accounts for repeated surveys, grouped response units and random effects variations. Estimation of model parameters is discussed within a likelihood based approach. Prediction of random effects, small area means across time points and per group units are derived. A parametric bootstrap method is proposed for estimating the mean-squared errors of the predicted small area means. Results are supported by a simulation study.

show abstract

Comparing growth velocity of HIV exposed and non-exposed infants: An observational study of infants enrolled in a randomized control trial in Zambia

et al. 2021

View full text Add to dashboard Cite

Background Impaired growth among infants remains one of the leading nutrition problems globally. In this study, we aimed to compare the growth trajectory rate and evaluate growth trajectory characteristics among children, who are HIV exposed uninfected (HEU) and HIV unexposed uninfected (HUU), under two years in Zambia. Method Our study used data from the ROVAS II study (PACTR201804003096919), an open-label randomized control trial of two verses three doses of live, attenuated, oral RotarixTM administered 6 &10 weeks or at 6 &10 weeks plus an additional dose at 9 months of age, conducted at George clinic in Lusaka, Zambia. Anthropometric measurements (height and weight) were collected on all scheduled and unscheduled visits. We defined linear growth velocity as the rate of change in height and estimated linear growth velocity as the first derivative of the mixed effect model with fractional polynomial transformations and, thereafter, used the second derivative test to determine the peak height and age at peak heigh. Results We included 212 infants in this study with median age 6 (IQR: 6–6) weeks of age. Of these 97 (45.3%) were female, 35 (16.4%) were stunted, and 59 (27.6%) were exposed to HIV at baseline. Growth velocity was consistently below the 3rd percentile of the WHO linear growth standard for HEU and HUU children. The peak height and age at peak height among HEU children were 74.7 cm (95% CI = 73.9–75.5) and 15.5 months (95% CI = 14.7–16.3) respectively and those for HUU were 73 cm (95% CI = 72.1–74.0) and 15.6 months (95% CI = 14.5–16.6) respectively. Conclusion We found no difference in growth trajectories between infants who are HEU and HUU. However, the data suggests that poor linear growth is universal and profound in this cohort and may have already occurred in utero.

show abstract

Performance of Machine Learning Classifiers in Classifying Stunting among Under-Five Children in Zambia

et al. 2022

View full text Add to dashboard Cite

Stunting is a global public health issue. We sought to train and evaluate machine learning (ML) classification algorithms on the Zambia Demographic Health Survey (ZDHS) dataset to predict stunting among children under the age of five in Zambia. We applied Logistic regression (LR), Random Forest (RF), SV classification (SVC), XG Boost (XgB) and Naïve Bayes (NB) algorithms to predict the probability of stunting among children under five years of age, on the 2018 ZDHS dataset. We calibrated predicted probabilities and plotted the calibration curves to compare model performance. We computed accuracy, recall, precision and F1 for each machine learning algorithm. About 2327 (34.2%) children were stunted. Thirteen of fifty-eight features were selected for inclusion in the model using random forest. Calibrating the predicted probabilities improved the performance of machine learning algorithms when evaluated using calibration curves. RF was the most accurate algorithm, with an accuracy score of 79% in the testing and 61.6% in the training data while Naïve Bayesian was the worst performing algorithm for predicting stunting among children under five in Zambia using the 2018 ZDHS dataset. ML models aids quick diagnosis of stunting and the timely development of interventions aimed at preventing stunting.

show abstract

Use of Machine Learning Techniques to Identify HIV Predictors for Screening

McSharry¹,

Mutai²,

Ngaruye³

et al. 2020

Preprint

View full text Add to dashboard Cite

Aim: HIV prevention measures at sub-Saharan Africa are still short of attaining the UNAIDS 90-90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well as predicting persons at high risk of the infection.Method: We applied six machine learning approaches for building models using population-based HIV Impact Assessment (PHIA) data for 41,939 male and 45,105 female respondents with 24 and 29 variables respectively from four countries in sub-Saharan countries. We trained and validated the six algorithms on 80% of data and tested on the remaining 20% where we rotated around the left-out country. An algorithm with the best mean f1 score was retained and trained on the most predictive variables. We used the model to identify people living with HIV and individuals with a higher likelihood of contracting the disease.Results: Application of XGBoost algorithm appeared to significantly improve identification of HIV positivity over the other six algorithms by f1 scoring mean of 78.9% and 92.8% for males and females respectively. Amongst the eight most predictor features in both sexes were: age, relationship with family head, the highest level of education, highest grade at that school level, work for payment, avoiding pregnancy, age at the first experience of sex, and wealth quintile. Model performance using these variables increased significantly compared to having all the variables included. We identified five males and seven females individuals that would require testing to find one HIV positive individual. We also predicted that 4·14% of males and 10.81% of females are at high risk of the infection.Conclusion: Our findings provide a potential use of XGBoost algorithm with socio-behavioural-driven data at substantially identifying HIV predictors and predicting individuals at high risk of infection for targeted screening.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.