Advances in scientific computing have allowed the development of complex models that are being routinely applied to problems in disease epidemiology, public health and decision making. The utility of these models depends in part on how well they can reproduce empirical data. However, fitting such models to real world data is greatly hindered both by large numbers of input and output parameters, and by long run times, such that many modelling studies lack a formal calibration methodology. We present a novel method that has the potential to improve the calibration of complex infectious disease models (hereafter called simulators). We present this in the form of a tutorial and a case study where we history match a dynamic, event-driven, individual-based stochastic HIV simulator, using extensive demographic, behavioural and epidemiological data available from Uganda. The tutorial describes history matching and emulation. History matching is an iterative procedure that reduces the simulator's input space by identifying and discarding areas that are unlikely to provide a good match to the empirical data. History matching relies on the computational efficiency of a Bayesian representation of the simulator, known as an emulator. Emulators mimic the simulator's behaviour, but are often several orders of magnitude faster to evaluate. In the case study, we use a 22 input simulator, fitting its 18 outputs simultaneously. After 9 iterations of history matching, a non-implausible region of the simulator input space was identified that was times smaller than the original input space. Simulator evaluations made within this region were found to have a 65% probability of fitting all 18 outputs. History matching and emulation are useful additions to the toolbox of infectious disease modellers. Further research is required to explicitly address the stochastic nature of the simulator as well as to account for correlations between outputs.
Semi-analytic models are a powerful tool for studying the formation of galaxies. However, these models inevitably involve a significant number of poorly constrained parameters that must be adjusted to provide an acceptable match to the observed Universe. In this paper, we set out to quantify the degree to which observational data sets can constrain the model parameters. By revealing degeneracies in the parameter space we can hope to better understand the key physical processes probed by the data. We use novel mathematical techniques to explore the parameter space of the GALFORM semi-analytic model. We base our investigation on the Bower et al. version of GALFORM, adopting the same methodology of selecting model parameters based on an acceptable match to the local b J and K luminosity functions. Since the GALFORM model is inherently approximate, we explicitly include a model discrepancy term when deciding if a match is acceptable or not. The model contains 16 parameters that are poorly constrained by our prior understanding of the galaxy formation processes and that can plausibly be adjusted between reasonable limits. We investigate this parameter space using the Model Emulator technique, constructing a Bayesian approximation to the GALFORM model that can be rapidly evaluated at any point in parameter space. The emulator returns both an expectation for the GALFORM model and an uncertainty which allows us to eliminate regions of parameter space in which it is implausible that a GALFORM run would match the luminosity function data. By combining successive waves of emulation, we show that only 0.26 per cent of the initial volume is of interest for further exploration. However, within this region we show that the Bower et al. model is only one choice from an extended subspace of model parameters that can provide equally acceptable fits to the luminosity function data. We explore the geometry of this region and begin to explore the physical connections between parameters that are exposed by this analysis. We also consider the impact of adding additional observational data to further constrain the parameter space. We see that the known tensions existing in the Bower et al. model lead to a further reduction in the successful parameter space.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.