2008
DOI: 10.1002/met.51
|View full text |Cite
|
Sign up to set email alerts
|

Understanding forecast verification statistics

Abstract: ABSTRACT:Although there are numerous reasons for performing a verification analysis, there are usually two general questions that are of interest: are the forecasts good, and can we be confident that the estimate of forecast quality is not misleading? When calculating a verification score, it is not usually obvious how the score can answer either of these questions. Some procedures for attempting to answer the questions are reviewed, with particular focus on p-values and confidence intervals. P -values are sho… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
69
0

Year Published

2009
2009
2018
2018

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 73 publications
(70 citation statements)
references
References 58 publications
1
69
0
Order By: Relevance
“…This is a rather strong assumption given the little amount of data concerning carbon exchange rates and the lack of repeated measurements that are needed to define variability in a climatological sense. It is more likely to get good scores by accident with limited set of data (Mason, 2008), such as the monthly observations in the HOT and BATS timeseries.…”
Section: Discussion and Applicationsmentioning
confidence: 99%
See 2 more Smart Citations
“…This is a rather strong assumption given the little amount of data concerning carbon exchange rates and the lack of repeated measurements that are needed to define variability in a climatological sense. It is more likely to get good scores by accident with limited set of data (Mason, 2008), such as the monthly observations in the HOT and BATS timeseries.…”
Section: Discussion and Applicationsmentioning
confidence: 99%
“…The empirical distribution of score values was constructed with 10 000 random re-samplings of the observation (or simulation) time series and computing the verification index for each new set of model-data pairs. As pointed out by Mason (2008), p-values do not answer the question whether the score value is good, but rather they provide a degree of significance with respect to random combinations. The confidence interval is instead computed by means of the bootstrap technique, in which the choice of randomly permuted model-data pairs is done by replacing the extracted pairs in the original time series.…”
Section: Data Sets and Skill Indicatorsmentioning
confidence: 98%
See 1 more Smart Citation
“…The reference forecasts are generated using a permutation procedure (see e.g. Mason, 2008;Deque, 2012). The permutation procedure generates a new set of forecasts-observation pairs in which observation are unrelated to the forecasts except by chance.…”
Section: Variation Of Evaluation Scores With Forecast Lead Timesmentioning
confidence: 99%
“…Bradley et al (2008) derive analytical expressions for the sampling variances of both the Brier score and the Brier skill score with respect to the sample climatological baseline. In broader overviews, Jolliffe (2007) reviews concepts in statistical inference as applicable to forecast verification, and Mason (2008) discusses the necessity of statistical inference on verification scores in order to evaluate whether a given result reflects meaningful skill above a naive baseline.…”
Section: Introductionmentioning
confidence: 99%