Accelerated sea ice loss and the possibility of ice-free summers in the Arctic has increased the interest in potential human activities in the far North (Stephenson et al., 2011). To address the planning and safety concerns associated with this, government and private agencies need better predictions of sea ice at subseasonal to seasonal timescales (Jung et al., 2016). Over the past few years, many operational centers are already starting to provide such forecasts with longer lead times, although the skill of these forecasts-and how to assess the skill in the first place-is still under question (Smith et al., 2015).There are numerous metrics to measure and quantify the accuracy of a forecast against observation, or "true" conditions, depending on the variable in question (Wilks, 2019). Whether or not forecasts are considered skillful depends not only on the metric to measure the forecast error, but also what benchmark is used to measure skill against. The skill of the forecast produced by a particular forecast system can be compared against that of an earlier version of the same system (e.g., Balan-Sarojini et al., 2019), a different forecast system (e.g., Zampieri