“…Most of the existing literature of decentralized learning are focused on the standard loss minimization formulation [41,43], i.e., min xβR π π (x), where π (β’) is the objective loss function and x denotes the global model parameters to be learned, and π is the model dimension. In the literature, a wide range of machine learning applications can be modeled by the standard decentralized loss minimization formulation (e.g., robotic network [15,32], network resource allocation [12,36], power networks [2,7]). Some recent works, [20,21,25,44] studied decentralized min-max optimization problems, i.e., min xβR π 1 max yβR π 2 π (x, y), which are a special case (with same outer and inner level objective) of bilevel optimization problems.…”