SignificanceCurrent climate models are too coarse to resolve many of the atmosphere’s most important processes. Traditionally, these subgrid processes are heuristically approximated in so-called parameterizations. However, imperfections in these parameterizations, especially for clouds, have impeded progress toward more accurate climate predictions for decades. Cloud-resolving models alleviate many of the gravest issues of their coarse counterparts but will remain too computationally demanding for climate change predictions for the foreseeable future. Here we use deep learning to leverage the power of short-term cloud-resolving simulations for climate modeling. Our data-driven model is fast and accurate, thereby showing the potential of machine-learning–based approaches to climate model development.
Ensemble weather predictions require statistical post-processing of systematic errors to obtain reliable and accurate probabilistic forecasts. Traditionally, this is accomplished with distributional regression models in which the parameters of a predictive distribution are estimated from a training period. We propose a flexible alternative based on neural networks that can incorporate nonlinear relationships between arbitrary predictor variables and forecast distribution parameters that are automatically learned in a data-driven way rather than requiring pre-specified link functions. In a case study of 2-meter temperature forecasts at surface stations in Germany, the neural network approach significantly outperforms benchmark post-processing methods while being computationally more affordable. Key components to this improvement are the use of auxiliary predictor variables and station-specific information with the help of embeddings. Furthermore, the trained neural network can be used to gain insight into the importance of meteorological variables thereby challenging the notion of neural networks as uninterpretable black boxes. Our approach can easily be extended to other statistical post-processing and forecasting problems. We anticipate that recent advances in deep learning combined with the ever-increasing amounts of model and observation data will transform the post-processing of numerical weather forecasts in the coming decade.
Representing unresolved moist convection in coarse‐scale climate models remains one of the main bottlenecks of current climate simulations. Many of the biases present with parameterized convection are strongly reduced when convection is explicitly resolved (i.e., in cloud resolving models at high spatial resolution approximately a kilometer or so). We here present a novel approach to convective parameterization based on machine learning, using an aquaplanet with prescribed sea surface temperatures as a proof of concept. A deep neural network is trained with a superparameterized version of a climate model in which convection is resolved by thousands of embedded 2‐D cloud resolving models. The machine learning representation of convection, which we call the Cloud Brain (CBRAIN), can skillfully predict many of the convective heating, moistening, and radiative features of superparameterization that are most important to climate simulation, although an unintended side effect is to reduce some of the superparameterization's inherent variance. Since as few as three months' high‐frequency global training data prove sufficient to provide this skill, the approach presented here opens up a new possibility for a future class of convection parameterizations in climate models that are built “top‐down,” that is, by learning salient features of convection from unusually explicit simulations.
• Benchmarks with strong baselines are a key ingredient for rapid progress on a problem. • Here, we define a benchmark for data-driven global, medium-range weather prediction. • The data is processed for convenient use in machine learning models and a quickstart guide is provided.
Numerical weather prediction has traditionally been based on the models that discretize the dynamical and physical equations of the atmosphere. Recently, however, the rise of deep learning has created increased interest in purely data‐driven medium‐range weather forecasting with first studies exploring the feasibility of such an approach. To accelerate progress in this area, the WeatherBench benchmark challenge was defined. Here, we train a deep residual convolutional neural network (Resnet) to predict geopotential, temperature and precipitation at 5.625° resolution up to 5 days ahead. To avoid overfitting and improve forecast skill, we pretrain the model using historical climate model output before fine‐tuning on reanalysis data. The resulting forecasts outperform previous submissions to WeatherBench and are comparable in skill to a physical baseline at similar resolution. We also analyze how the neural network creates its predictions and find that, for the case studies analyzed, the model has learned physically reasonable correlations. Finally, we perform scaling experiments to estimate the potential skill of data‐driven approaches at higher resolutions.
The relative contributions of soil moisture heterogeneities, a stochastic boundary‐layer perturbation scheme and varied aerosol concentrations representing microphysical uncertainties on the diurnal cycle of convective precipitation and its spatial variability are examined conditional on the prevailing weather regime. To achieve this, separate perturbed‐parameter ensemble simulations are performed with the Consortium for Small‐scale Modeling (COSMO) model at convection‐permitting horizontal grid spacing for 10 days during a high‐impact weather episode in 2016 in Central Europe. We consider hourly precipitation amounts and their spatial distribution, focus on ensemble mean and spread aggregated over strong and weak forcing conditions, and employ spatial evaluation techniques. The convective adjustment time‐scale diagnostic is used to distinguish the different precipitation regimes. While the total amount of daily precipitation is hardly changed by the different perturbation approaches (less than 5%), the spatial variability of precipitation exhibits clear differences. Soil moisture heterogeneity primarily introduces variability during convection initiation causing a steeper increase in normalized rainfall spread prior to the onset of afternoon precipitation. The stochastic boundary‐layer perturbations lead to the largest spatial variability impacting precipitation from initial time onwards with an amplitude comparable to the operational ensemble spread. Similarly, perturbed aerosol concentrations impact spatial precipitation variability from the model start onwards, but to a smaller degree. Soil moisture heterogeneity shows the strongest weather regime dependence, with the greatest impact on convection during weak synoptic forcing. All types of perturbation increase dispersion of precipitation while maintaining the domain‐averaged precipitation rates.
Abstract. Over the last couple of years, machine learning parameterizations have emerged as a potential way to improve the representation of subgrid processes in Earth system models (ESMs). So far, all studies were based on the same three-step approach: first a training dataset was created from a high-resolution simulation, then a machine learning algorithm was fitted to this dataset, before the trained algorithm was implemented in the ESM. The resulting online simulations were frequently plagued by instabilities and biases. Here, coupled online learning is proposed as a way to combat these issues. Coupled learning can be seen as a second training stage in which the pretrained machine learning parameterization, specifically a neural network, is run in parallel with a high-resolution simulation. The high-resolution simulation is kept in sync with the neural network-driven ESM through constant nudging. This enables the neural network to learn from the tendencies that the high-resolution simulation would produce if it experienced the states the neural network creates. The concept is illustrated using the Lorenz 96 model, where coupled learning is able to recover the “true” parameterizations. Further, detailed algorithms for the implementation of coupled learning in 3D cloud-resolving models and the super parameterization framework are presented. Finally, outstanding challenges and issues not resolved by this approach are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.