The adoption of knowledge-based dose-volume histogram (DVH) prediction models for assessing organ-at-risk (OAR) sparing in radiotherapy necessitates quantification of prediction accuracy and uncertainty. Moreover, DVH prediction error bands should be readily interpretable as confidence intervals in which to find a percentage of clinically acceptable DVHs. In the event such DVH error bands are not available, we present an independent error quantification methodology using a local reference cohort of high-quality treatment plans, and apply it to two DVH prediction models, ORBIT-RT and RapidPlan, trained on the same set of 90 volumetric modulated arc therapy (VMAT) plans. Organ-atrisk DVH predictions from each model were then generated for a separate set of 45 prostate VMAT plans. Dose-volume histogram predictions were then compared to their analogous clinical DVHs to define prediction errors V clin,i À V pred,i (ith plan), from which prediction bias μ, prediction error variation σ, and root-mean-square error RMSE pred ≡ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 N ∑ i V clin,i À V pred,i À Á 2 r ≅ ffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi σ 2 þ μ 2 p could be calculated for the cohort. The empirical RMSE pred was then contrasted to the model-provided DVH error estimates. For all prostate OARs, above 50% Rx dose, ORBIT-RT μ and σ were comparable to or less than those of RapidPlan. Above 80% Rx dose, μ < 1% and σ < 3-4% for both models. As a result, above 50% Rx dose, ORBIT-RT RMSE pred was below that of RapidPlan, indicating slightly improved accuracy in this cohort. Because μ ≈ 0, RMSE pred is readily interpretable as a canonical standard deviation σ, whose error band is expected to correctly predict 68% of normally distributed clinical DVHs. By contrast, RapidPlan's provided error band, although described in literature as a standard deviation range, was slightly less predictive than RMSE pred (55-70% success), while the provided ORBIT-RT error band was confirmed to resemble an interquartile range (40-65% success) as described. Clinicians can apply this methodology using their own institutions' reference cohorts to (a) independently assess a knowledge-based model's predictive accuracy of local treatment plans, and (b) interpret from any error band whether further OAR dose sparing is likely attainable.