This paper presents an impact assessment for the imputation of missing data. The data set used is HIV Seroprevalence data from an antenatal clinic study survey performed in 2001. Data imputation is performed through five methods: Random Forests, Autoassociative Neural Networks with Genetic Algorithms, Autoassociative Neuro-Fuzzy configurations, and two Random Forest and Neural Network based hybrids. Results indicate that Random Forests are superior in imputing missing data in terms both of accuracy and of computation time, with accuracy increases of up to 32% on average for certain variables when compared with autoassociative networks. While the hybrid systems have significant promise, they are hindered by their Neural Network components. The imputed data is used to test for impact in three ways: through statistical analysis, HIV status classification and through probability prediction with Logistic Regression. Results indicate that these methods are fairly immune to imputed data, and that the impact is not highly significant, with linear correlations of 96% between HIV probability prediction and a set of two imputed variables using the logistic regression analysis.
Chest X-rays are a vital diagnostic tool in the workup of many patients. Similar to most medical imaging modalities, they are profoundly multi-modal and are capable of visualising a variety of combinations of conditions. There is an ever pressing need for greater quantities of labelled images to drive forward the development of diagnostic tools; however, this is in direct opposition to concerns regarding patient confidentiality which constrains access through permission requests and ethics approvals. Previous work has sought to address these concerns by creating class-specific generative adversarial networks (GANs) that synthesise images to augment training data. These approaches cannot be scaled as they introduce computational trade offs between model size and class number which places fixed limits on the quality that such generates can achieve. We address this concern by introducing latent class optimisation which enables efficient, multi-modal sampling from a GAN and with which we synthesise a large archive of labelled generates. We apply a Progressive Growing GAN (PGGAN) to the task of unsupervised X-ray synthesis and have radiologists evaluate the clinical realism of the resultant samples. We provide an in depth review of the properties of varying pathologies seen on generates as well as an overview of the extent of disease diversity captured by the model. We validate the application of the Fréchet Inception Distance (FID) to measure the quality of X-ray generates and find that they are similar to other high-resolution tasks. We quantify X-ray clinical realism by asking radiologists to distinguish between real and fake scans and find that generates are more likely to be classed as real than by chance, but there is still progress required to achieve true realism. We confirm these findings by evaluating synthetic classification model performance on real scans. We conclude by discussing the limitations of PGGAN generates and how to achieve controllable, realistic generates going forward. We release our source code, model weights, and an archive of labelled generates.
Interest in the mathematical modeling of infectious diseases has increased due to the COVID-19 pandemic. However, many medical students do not have the required background in coding or mathematics to engage optimally in this approach. System dynamics is a methodology for implementing mathematical models as easy-to-understand stock-flow diagrams. Remarkably, creating stock-flow diagrams is the same process as creating the equivalent differential equations. Yet, its visual nature makes the process simple and intuitive. We demonstrate the simplicity of system dynamics by applying it to epidemic models including a model of COVID-19 mutation. We then discuss the ease with which far more complex models can be produced by implementing a model comprising eight differential equations of a Chikungunya epidemic from the literature. Finally, we discuss the learning environment in which the teaching of the epidemic modeling occurs. We advocate the widespread use of system dynamics to empower those who are engaged in infectious disease epidemiology, regardless of their mathematical background.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.