“…In distributionally robust optimisation (DRO) ( Ben-Tal et al, 2013 , Rahimian and Mehrotra, 2019 ), one aims to minimise the worst-case expected loss over an ‘uncertainty set’ of distributions. In the group DRO setting ( Hu et al, 2018 , Oren et al, 2019 , Sagawa et al, 2020 ), this minimisation is simply over the (instantaneous) worst-performing group of examples. In the context of neural network optimisation, given training data already divided into groups, Sagawa et al (2020) minimise this empirical worst-group risk while demonstrating the importance of simultaneously enhancing generalisability through greater regularisation.…”