“…Beyond practical deployments of machine learned models, the reproducibility crisis in the machine learning academic world has also been well-documented: see [Pineau et al, 2021] and the references therein for an excellent discussion of the reasons for irreproducibility (insufficient exploration of hyperparameters and experimental setups, lack of sufficient documentation, inaccessible code, and different computational hardware) and for mitigation recommendations. However, recent papers , D'Amour et al, 2020, Dusenberry et al, 2020, Snapp and Shamir, 2021, Summers and Dinneen, 2021, Yu et al, 2021 have also demonstrated that even when models are trained on identical datasets with identical optimization algorithms, architectures, and hyperparameters, they can produce significantly different predictions on the same example. This type of irreproducibility may be caused by multiple factors [D'Amour et al, 2020, Fort et al, 2020, Frankle et al, 2020, Shallue et al, 2018, Snapp and Shamir, 2021, Summers and Dinneen, 2021, such as non-convexity of the objective, random initialization, nondeterminism in training such as data shuffling, parallelism, random schedules, hardware used, and round off quantization errors.…”