“…C URRENTLY, a number of important machine learning and deep learning tasks can be captured by hierarchical models, such as hyper-parameter optimization [1], [2], [3], [4], neural architecture search [5], [6], [7], meta learning [8], [9], [10], Generative Adversarial Networks (GAN) [11], [12], reinforcement learning [13], image processing [14], [15], [16], [17], and so on. In general, these hierarchical models can be formulated as the following Bi-Level Optimization (BLO) problem [18], [19], [20]: " min x∈X " F (x, y), s.t. y ∈ S(x) := arg min y f (x, y), (1) where x ∈ X is the Upper-Level (UL) variable, y ∈ R n is the Lower-Level (LL) variable, the UL objective F (x, y) : X ×R n → R and the LL objective f (x, y) : R m ×R n → R, are continuously differentiable and jointly continuous functions, and the UL constraint X ⊂ R m is a compact set.…”