Abstract. Computer simulators can be computationally intensive to run over a large number of input values, as required for optimization and various uncertainty quantification tasks. The standard paradigm for the design and analysis of computer experiments is to employ Gaussian random fields to model computer simulators. Gaussian process models are trained on input-output data obtained from simulation runs at various input values. Following this approach, we propose a sequential design algorithm MICE (mutual information for computer experiments) that adaptively selects the input values at which to run the computer simulator in order to maximize the expected information gain (mutual information) over the input space. The superior computational efficiency of the MICE algorithm compared to other algorithms is demonstrated by test functions and by a tsunami simulator with overall gains of up to 20% in that case.Key words. active learning, best linear unbiased prediction, Gaussian process, shallow water equations AMS subject classifications. 60G15, 62M20, 62K99, 65Y20, 91B74DOI. 10.1137/1409896131. Introduction. Computer experiments are widely employed to study physical processes [31,36] and involve running a computer simulator which mimics the physical process at various input values. When the computer simulator is computationally expensive to run, say, minutes, hours, or even days, often on a high performance cluster, only a limited number of simulation runs can be afforded, making the planning of such experiments even more important. Surrogate models, also known as emulators, are often used as means for designing and analyzing computer experiments [31]. Emulators are statistical models that have been used to approximate the input-output behavior of computer simulators for making probabilistic predictions. In this setting, we want to find a design of computer experiments that with minimal computational effort leads to a surrogate model with a good overall fit. We restrict our attention to deterministic computer simulators with a scalar output. In design of experiments it is customary to use space-filling designs [36] such as uniform designs, multilayer designs, maximin (Mm)-and minimax (mM)-distance designs, and Latin hypercube designs (LHD). Space-filling designs treat all regions of the design space as equally important, but are "one shot" designs that may waste computations over some unnecessary regions of the input space. A variety of adaptive designs have been proposed which can take advantage of information collected during the experimental design process [21,31], typically in the form of input-output
SUMMARYIn this article, we propose a new technique for ozone forecasting. The approach is functional, that is we consider stochastic processes with values in function spaces. We make use of the essential characteristic of this type of phenomenon by taking into account theoretically and practically the continuous time evolution of pollution. One main methodological enhancement of this article is the incorporation of exogenous variables (wind speed and temperature) in those models. The application is carried out on a six-year data set of hourly ozone concentrations and meteorological measurements from Béthune (France). The study examines the summer periods because of the higher values observed. We explain the non-parametric estimation procedure for autoregressive Hilbertian models with or without exogenous variables (considering two alternative versions in this case) as well as for the functional kernel model. The comparison of all the latter models is based on up-to-24 hour-ahead predictions of hourly ozone concentrations. We analyzed daily forecast curves upon several criteria of two kinds: functional ones, and aggregated ones where attention is put on the daily maximum. It appears that autoregressive Hilbertian models with exogenous variables show the best predictive power.
Abstract. High accuracy complex computer models, also called simulators, require large resources in time and memory to produce realistic results. Statistical emulators are computationally cheap approximations of such simulators. They can be built to replace simulators for various purposes, such as the propagation of uncertainties from inputs to outputs or the calibration of some internal parameters against observations. However, when the input space is of high dimension, the construction of an emulator can become prohibitively expensive. In this paper, we introduce a joint framework merging emulation with dimension reduction in order to overcome this hurdle. The gradient-based kernel dimension reduction technique is chosen due to its ability to drastically decrease dimensionality with little loss in information. The Gaussian process emulation technique is combined with this dimension reduction approach. Theoretical properties of the approximation are explored. Our proposed approach provides an answer to the dimension reduction issue in emulation for a wide range of simulation problems that cannot be tackled using existing methods. The efficiency and accuracy of the proposed framework is demonstrated theoretically and compared with other methods on an elliptic partial differential equation (PDE) problem. We finally present a realistic application to tsunami modeling. The uncertainties in the bathymetry (seafloor elevation) are modeled as high-dimensional realizations of a spatial process using a geostatistical approach. Our dimension-reduced emulation enables us to compute the impact of these uncertainties on resulting possible tsunami wave heights near-shore and on-shore. Considering an uncertain earthquake source, we observe a significant increase in the spread of uncertainties in the tsunami heights due to the contribution of the bathymetry uncertainties to the overall uncertainty budget. These results highlight the need to include the effect of uncertainties in the bathymetry in tsunami early warnings and risk assessments.
Statistical methods constitute a useful approach to understand and quantify the uncertainty that governs complex tsunami mechanisms. Numerical experiments may often have a high computational cost. This forms a limiting factor for performing uncertainty and sensitivity analyses, where numerous simulations are required. Statistical emulators, as surrogates of these simulators, can provide predictions of the physical process in a much faster and computationally inexpensive way. They can form a prominent solution to explore thousands of scenarios that would be otherwise numerically expensive and difficult to achieve. In this work, we build a statistical emulator of the deterministic codes used to simulate submarine sliding and tsunami generation at the Rockall Bank, NE Atlantic Ocean, in two stages. First we calibrate, against observations of the landslide deposits, the parameters used in the landslide simulations. This calibration is performed under a Bayesian framework using Gaussian Process (GP) emulators to approximate the landslide model, and the discrepancy function between model and observations. Distributions of the calibrated input parameters are obtained as a result of the calibration. In a second step, a GP emulator is built to mimic the coupled landslide-tsunami numerical process. The emulator propagates the uncertainties in the distributions of the calibrated input parameters inferred from the first step to the outputs. As a result, a quantification of the uncertainty of the maximum free surface elevation at specified locations is obtained.
Tsunamis are unpredictable and infrequent but potentially large impact natural disasters. To prepare, mitigate and prevent losses from tsunamis, probabilistic hazard and risk analysis methods have been developed and have proved useful. However, large gaps and uncertainties still exist and many steps in the assessment methods lack information, theoretical foundation, or commonly accepted methods. Moreover, applied methods have very different levels of maturity, from already advanced probabilistic tsunami hazard analysis for earthquake sources, to less mature probabilistic risk analysis. In this review we give an overview of the current state of probabilistic tsunami hazard and risk analysis. Identifying research gaps, we offer suggestions for future research directions. An extensive literature list allows for branching into diverse aspects of this scientific approach.
Abstract. Due to the catastrophic consequences of tsunamis, early warnings need to be issued quickly in order to mitigate the hazard. Additionally, there is a need to represent the uncertainty in the predictions of tsunami characteristics corresponding to the uncertain trigger features (e.g. either position, shape and speed of a landslide, or sea floor deformation associated with an earthquake). Unfortunately, computer models are expensive to run. This leads to significant delays in predictions and makes the uncertainty quantification impractical. Statistical emulators run almost instantaneously and may represent well the outputs of the computer model. In this paper, we use the outer product emulator to build a fast statistical surrogate of a landslide-generated tsunami computer model. This Bayesian framework enables us to build the emulator by combining prior knowledge of the computer model properties with a few carefully chosen model evaluations. The good performance of the emulator is validated using the leave-one-out method.
We consider the functional linear regression model where the explanatory variable is a random surface and the response is a real random variable, in various situations where both the explanatory variable and the noise can be unbounded and dependent. Bivariate splines over triangulations represent the random surfaces. We use this representation to construct least squares estimators of the regression function with a penalisation term. Under the assumptions that the regressors in the sample span a large enough space of functions, bivariate splines approximation properties yield the consistency of the estimators. Simulations demonstrate the quality of the asymptotic properties on a realistic domain. We also carry out an application to ozone concentration forecasting over the USA that illustrates the predictive skills of the method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.