2022
DOI: 10.48550/arxiv.2205.00865
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

WeatherBench Probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models

Abstract: WeatherBench is a benchmark dataset for medium-range weather forecasting of geopotential, temperature and precipitation, consisting of preprocessed data, predefined evaluation metrics and a number of baseline models. WeatherBench Probability extends this to probabilistic forecasting by adding a set of established probabilistic verification metrics (continuous ranked probability score, spread-skill ratio and rank histograms) and a state-of-the-art operational baseline using the ECWMF IFS ensemble forecast. In a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 10 publications
1
10
0
Order By: Relevance
“…This is quantified by the large PIT D statistics (Figure 10b), a lack of strong correlation between the IQR and deterministic error (Figure 10c), and IQR capture fractions far below 0.5 (Figure 10d). This failure of MC Dropout was also noted by Garg et al (2022) for medium-range weather forecasts. The BNN performs much more similar to the simple SHASH, and in some cases may even perform better, although only marginally so.…”
Section: Comparison With Alternative Neural Network Uncertainty Appro...supporting
confidence: 53%
See 1 more Smart Citation
“…This is quantified by the large PIT D statistics (Figure 10b), a lack of strong correlation between the IQR and deterministic error (Figure 10c), and IQR capture fractions far below 0.5 (Figure 10d). This failure of MC Dropout was also noted by Garg et al (2022) for medium-range weather forecasts. The BNN performs much more similar to the simple SHASH, and in some cases may even perform better, although only marginally so.…”
Section: Comparison With Alternative Neural Network Uncertainty Appro...supporting
confidence: 53%
“…However, this type of residual diagnostic does not address whether the predicted uncertainties are useful. While many metrics for evaluating probabilistic predictions exist (e.g., Garg et al, 2022), here we discuss three.…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…This is reminiscent of variational auto-encoders where the hyperparameters nonetheless lie in the bottleneck layer. Yet, such an approach is much more difficult to train [111]. Indeed, those hyperparameters would be at the second level in a Bayesian hierarchy and requires more testing and investigations.…”
Section: Uncertainty Quantificationmentioning
confidence: 99%
“…The reasons for this were the under-specification of the metrics used to report the results, the lack of access to the used data, and the lack of availability of the code necessary for evaluation. Thus, we were unable to reproduce the reported results and evaluate our method in the same way, which seems to be a general problem in machinelearning research [56,57].…”
Section: Appendix Amentioning
confidence: 99%