“…For DE, we follow the standard random initialization made available by PyTorch, while for VI, we set the prior distribution variance to 0.03. The parameter r in (15) for CM is set to 1, yielding standard model averaging [15]; while r in (19) for PM is set to r = 45, with a r = K 1/r following ( [33], Table 1) based on the numerical minimization of latency on a held-out dataset. The results are averaged over 50 different realizations of calibration and test datasets, and the number of ensemble K is set to 6.…”