Matthias Katzfuß scite author profile

A probabilistic forecast takes the form of a predictive probability distribution over future quantities or events of interest. Probabilistic forecasting aims to maximize the sharpness of the predictive distributions, subject to calibration, on the basis of the available information set. We formalize and study notions of calibration in a prediction space setting. In practice, probabilistic calibration can be checked by examining probability integral transform (PIT) histograms. Proper scoring rules such as the logarithmic score and the continuous ranked probability score serve to assess calibration and sharpness simultaneously. As a special case, consistent scoring functions provide decision-theoretically coherent tools for evaluating point forecasts. We emphasize methodological links to parametric and nonparametric distributional regression techniques, which attempt to model and to estimate conditional distribution functions; we use the context of statistically postprocessed ensemble forecasts in numerical weather prediction as an example. Throughout, we illustrate concepts and methodologies in data examples.

show abstract

A Case Study Competition Among Methods for Analyzing Large Spatial Data

Heaton

Datta

Finley

et al. 2018

JABES

341

277

View full text Add to dashboard Cite

The Gaussian process is an indispensable tool for spatial data analysts. The onset of the “big data” era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online. Electronic Supplementary Material Supplementary materials for this article are available at 10.1007/s13253-018-00348-w.

show abstract

A Multi-Resolution Approximation for Massive Spatial Datasets

Katzfuß

2017

Journal of the American Statistical Association

216

236

View full text Add to dashboard Cite

Automated sensing instruments on satellites and aircraft have enabled the collection of massive amounts of high-resolution observations of spatial fields over large spatial regions. If these datasets can be efficiently exploited, they can provide new insights on a wide variety of issues. However, traditional spatial-statistical techniques such as kriging are not computationally feasible for big datasets. We propose a multi-resolution approximation (M -RA) of Gaussian processes observed at irregular locations in space. The M -RA process is specified as a linear combination of basis functions at multiple levels of spatial resolution, which can capture spatial structure from very fine to very large scales. The basis functions are automatically chosen to approximate a given covariance function, which can be nonstationary. All computations involving the M -RA, including parameter inference and prediction, are highly scalable for massive datasets. Crucially, the inference algorithms can also be parallelized to take full advantage of large distributed-memory computing environments. In comparisons using simulated data and a large satellite dataset, the M -RA outperforms a related state-of-the-art method.

show abstract

A General Framework for Vecchia Approximations of Gaussian Processes

Katzfuß¹,

Guinness²

2021

Statist. Sci.

159

162

View full text Add to dashboard Cite

Gaussian processes (GPs) are commonly used as models for functions, time series, and spatial fields, but they are computationally infeasible for large datasets. Focusing on the typical setting of modeling data as a GP plus an additive noise term, we propose a generalization of the Vecchia (J. Roy. Statist. Soc. Ser. B 50 (1988) 297-312) approach as a framework for GP approximations. We show that our general Vecchia approach contains many popular existing GP approximations as special cases, allowing for comparisons among the different methods within a unified framework. Representing the models by directed acyclic graphs, we determine the sparsity of the matrices necessary for inference, which leads to new insights regarding the computational properties. Based on these results, we propose a novel sparse general Vecchia approximation, which ensures computational feasibility for large spatial datasets but can lead to considerable improvements in approximation accuracy over Vecchia's original approach. We provide several theoretical results and conduct numerical comparisons. We conclude with guidelines for the use of Vecchia approximations in spatial statistics.

show abstract

Spatio‐temporal smoothing and EM estimation for massive remote‐sensing data sets

Katzfuß

Cressie

2011

Journal Time Series Analysis

124

103

View full text Add to dashboard Cite

The use of satellite measurements in climate studies promises many new scientific insights if those data can be efficiently exploited. Due to sparseness of daily data sets, there is a need to fill spatial gaps and to borrow strength from adjacent days. Nonetheless, these satellites are typically capable of conducting on the order of 100,000 retrievals per day, which makes it impossible to apply traditional spatio-temporal statistical methods, even in supercomputing environments. To overcome these challenges, we make use of a spatio-temporal mixed-effects model. For each massive daily data set, dimension reduction is achieved by essentially modelling the underlying process as a linear combination of spatial basis functions on the globe. The application of a dynamical autoregressive model in time, over the reduced space, allows rapid sequential computation of optimal smoothing predictions via the Kalman smoother; this is known as Fixed Rank Smoothing (FRS). The dimension-reduced mixedeffects model contains a number of unknown parameters, including covariance and propagator matrices, which describe the spatial and temporal dependence structure in the reduced-dimensional process. We take an empiricalBayes approach to inference, which involves estimating the parameters and substituting them into the optimal predictors. Method-of-moments (MM) parameter estimation (currently used in FRS) is typically inefficient compared to maximum likelihood (ML) estimation and can result in large sampling variability. Here, we develop ML estimation via an expectation-maximization (EM) algorithm, which offers stable computation of valid estimators and makes efficient use of spatial and temporal dependence in the data. The two parameter-estimation approaches, MM and ML, are compared in a simulation study. We also apply our methodology to global satellite CO 2 measurements: We optimally smooth the sparse daily CO 2 maps obtained by the Atmospheric InfraRed Sounder (AIRS) instrument on the Aqua satellite; then, using FRS with EM-estimated parameters, a complete sequence of the daily global CO 2 fields can be obtained, together with their associated prediction uncertainties.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.