Background In developing countries like Indonesia, limited resources for routine mass Coronavirus Disease 2019 (COVID-19) RT-PCR testing among healthcare workers leave them with a heightened risk of late detection and undetected infection, increasing the spread of the virus. Accessible and accurate methodologies must be developed to identify COVID-19 positive healthcare workers. This study aimed to investigate the application of machine learning classifiers to predict the risk of COVID-19 positivity in high-risk populations where resources are limited and accessibility is desired. Methods Two sets of models were built: one both trained and tested on data from healthcare workers in Jakarta and Semarang, and one trained on Jakarta healthcare workers and tested on Semarang healthcare workers. Models were assessed by the area under the receiver-operating-characteristic curve (AUC), average precision (AP), and Brier score (BS). Shapley additive explanations (SHAP) were used to analyze feature importance. 5,394 healthcare workers were included in the final dataset for this study. Results For the full model, the voting classifier composed of random forest and logistic regression was selected as the algorithm of choice and achieved training AUC (mean [Standard Deviation (SD)], 0.832 [0.033]) and AP (mean [SD], 0.476 [0.042]) and was high performing during testing with AUC and AP of 0.753 and 0.504 respectively. A voting classifier composed of a random forest and a XGBoost classifier was best performing during cross-validation for the Jakarta model, with AUC (mean [SD], 0.827 [0.023]), AP (mean [SD], 0.461 [0.025]). The performance when testing on the Semarang healthcare workers was AUC of 0.725 and AP of 0.582. Conclusions Our models yielded high predictive performance and can be used as an alternate COVID-19 screening methodology for healthcare workers in Indonesia, although the low adoption rate by partner hospitals despite its usefulness is a concern.
Background: In developing countries like Indonesia, limited resources for routine mass Coronavirus Disease 2019 (COVID-19) RT-PCR testing among healthcare workers leave them with a heightened risk of late detection and undetected infection, increasing the spread of the virus. Accessible and accurate methodologies must be developed to identify COVID-19 positive healthcare workers. This study aimed to investigate the application of machine learning classifiers to predict the risk of COVID-19 positivity in high-risk populations where resources are limited and accessibility is desired. Methods: Two sets of models were built: one both trained and tested on data from healthcare workers in Jakarta and Semarang, and one trained on Jakarta healthcare workers and tested on Semarang healthcare workers. Models were assessed by the area under the receiver-operating-characteristic curve (AUC), average precision (AP), and Brier score (BS). Shapley additive explanations (SHAP) were used to analyze feature importance. 5,394 healthcare workers were included in the final dataset for this study. Results: For the full model, the voting classifier composed of random forest and logistic regression was selected as the algorithm of choice and achieved training AUC (mean [Standard Deviation (SD)], 0.832 [0.033]) and AP (mean [SD], 0.476 [0.042]) and was high performing during testing with AUC and AP of 0.753 and 0.504 respectively. A voting classifier composed of a random forest and a XGBoost classifier was best performing during cross-validation for the Jakarta model, with AUC (mean [SD], 0.827 [0.023]), AP (mean [SD], 0.461 [0.025]). The performance when testing on the Semarang healthcare workers was AUC of 0.725 and AP of 0.582. Conclusions: Our models yielded high predictive performance and can be used as an alternate COVID-19 screening methodology for healthcare workers in Indonesia, although the low adoption rate by partner hospitals despite its usefulness is a concern.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.