Computer simulations are invaluable tools for scientific discovery. However, accurate simulations are often slow to execute, which limits their applicability to extensive parameter exploration, large-scale data analysis, and uncertainty quantification. A promising route to accelerate simulations by building fast emulators with machine learning requires large training datasets, which can be prohibitively expensive to obtain with slow simulations. Here we present a method based on neural architecture search to build accurate emulators even with a limited number of training data. The method successfully emulates simulations in 10 scientific cases including astrophysics, climate sci-ence, biogeochemistry, high energy density physics, fusion energy, and seismology, using the same super-architecture, algorithm, and hyperparameters. Our approach also inherently provides emulator uncertainty estimation, adding further confidence in their use. We anticipate this work will accelerate research involving expensive simulations, allow more extensive parameters exploration, and enable new, previously unfeasible computational discovery.
Abstract. The skill of global ocean biogeochemical models, and the earth system models in which they are embedded, can be improved by systematic calibration of the parameter values against observations. However, such tuning is seldom undertaken as these models are computationally very expensive. Here we investigate the performance of DFO-LS, a local, derivative-free optimisation algorithm which has been designed for computationally expensive models with irregular model–data misfit landscapes typical of biogeochemical models. We use DFO-LS to calibrate six parameters of a relatively complex global ocean biogeochemical model (MOPS) against synthetic dissolved oxygen, phosphate and nitrate “observations” from a reference run of the same model with a known parameter configuration. The performance of DFO-LS is compared with that of CMA-ES, another derivative-free algorithm that was applied in a previous study to the same model in one of the first successful attempts at calibrating a global model of this complexity. We find that DFO-LS successfully recovers five of the six parameters in approximately 40 evaluations of the misfit function (each one requiring a 3000-year run of MOPS to equilibrium), while CMA-ES needs over 1200 evaluations. Moreover, DFO-LS reached a “baseline” misfit, defined by observational noise, in just 11–14 evaluations, whereas CMA-ES required approximately 340 evaluations. We also find that the performance of DFO-LS is not significantly affected by observational sparsity, however fewer parameters were successfully optimised in the presence of observational uncertainty. The results presented here suggest that DFO-LS is sufficiently inexpensive and robust to apply to the calibration of complex, global ocean biogeochemical models.
Abstract. The performance of global ocean biogeochemical models, and the Earth System Models in which they are embedded, can be improved by systematic calibration of the parameter values against observations. However, such tuning is seldom undertaken as these models are computationally very expensive. Here we investigate the performance of DFO-LS, a local, derivative-free optimisation algorithm which has been designed for computationally expensive models with irregular model-data misfit landscapes typical of biogeochemical models. We use DFO-LS to calibrate six parameters of a relatively complex global ocean biogeochemical model (MOPS) against synthetic dissolved oxygen, inorganic phosphate and inorganic nitrate observations from a reference run of the same model with a known parameter configuration. The performance of DFO-LS is compared with that of CMA-ES, another derivative-free algorithm that was applied in a previous study to the same model in one of the first successful attempts at calibrating a global model of this complexity. We find that DFO-LS successfully recovers 5 of the 6 parameters in approximately 40 evaluations of the misfit function (each one requiring a 3000 year run of MOPS to equilibrium), while CMA-ES needs over 1200 evaluations. Moreover, DFO-LS reached a baseline misfit, defined by observational noise, in just 11–14 evaluations, whereas CMA-ES required approximately 340 evaluations. We also find that the performance of DFO-LS is not significantly affected by observational sparsity, however fewer parameters were successfully optimised in the presence of observational uncertainty. The results presented here suggest that DFO-LS is sufficiently inexpensive and robust to apply to the calibration of complex, global ocean biogeochemical models.
Abstract. Biogeochemical model behaviour for micronutrients is typically hard to constrain because of the sparsity of observational data, the difficulty of determining parameters in situ, and uncertainties in observations and models. Here, we assess the influence of data distribution, model uncertainty, and the misfit function on objective parameter optimisation in a model of the oceanic cycle of zinc (Zn), an essential micronutrient for marine phytoplankton with a long whole-ocean residence time. We aim to investigate whether observational constraints are sufficient for reconstruction of biogeochemical model behaviour, given that the Zn data coverage provided by the GEOTRACES Intermediate Data Product 2017 is sparse. Furthermore, we aim to assess how optimisation results are affected by the choice of the misfit function and by confounding factors such as analytical uncertainty in the data or biases in the model related to either seasonal variability or the larger-scale circulation. The model framework applied herein combines a marine Zn cycling model with a state-of-the-art estimation of distribution algorithm (Covariance Matrix Adaption Evolution Strategy, CMA-ES) to optimise the model towards synthetic data in an ensemble of 26 optimisations. Provided with a target field that can be perfectly reproduced by the model, optimisation retrieves parameter values perfectly regardless of data coverage. As differences between the model and the system underlying the target field increase, the choice of the misfit function can greatly impact optimisation results, while limitation of data coverage is in most cases of subordinate significance. In cases where optimisation to full or limited data coverage produces relatively distinct model behaviours, we find that applying a misfit metric that compensates for differences in data coverage between ocean basins considerably improves agreement between optimisation results obtained with the two data situations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.