The platform will undergo maintenance on Sep 14 at about 9:30 AM EST and will be unavailable for approximately 1 hour.
2021
DOI: 10.48550/arxiv.2101.11517
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Investigating Bi-Level Optimization for Learning and Vision from a Unified Perspective: A Survey and Beyond

Abstract: Bi-Level Optimization (BLO) is originated from the area of economic game theory and then introduced into the optimization community. BLO is able to handle problems with a hierarchical structure, involving two levels of optimization tasks, where one task is nested inside the other. In machine learning and computer vision fields, despite the different motivations and mechanisms, a lot of complex problems, such as hyper-parameter optimization, multi-task and meta-learning, neural architecture search, adversarial … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
22
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2

Relationship

2
7

Authors

Journals

citations
Cited by 15 publications
(22 citation statements)
references
References 133 publications
0
22
0
Order By: Relevance
“…However, as a value-function, ϕ(x) is non-smooth, non-convex, and even with jumps, thus ill-conditioned, so we use a smooth function to approximate ϕ(x) and obtain ∂ϕ(x) ∂x . Existing methods can be classified into two categories according to divergent ways to calculate ∂ϕ(x) ∂x [20], [35], [36], i.e., Explicit Gradient-Based Methods (EGBMs), which derives the gradient by Automatic Differentiation (AD), and Implicit Gradient-Based Methods (IGBMs), which apply the implicit function theorem to deal with the optimality conditions of LL problems.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, as a value-function, ϕ(x) is non-smooth, non-convex, and even with jumps, thus ill-conditioned, so we use a smooth function to approximate ϕ(x) and obtain ∂ϕ(x) ∂x . Existing methods can be classified into two categories according to divergent ways to calculate ∂ϕ(x) ∂x [20], [35], [36], i.e., Explicit Gradient-Based Methods (EGBMs), which derives the gradient by Automatic Differentiation (AD), and Implicit Gradient-Based Methods (IGBMs), which apply the implicit function theorem to deal with the optimality conditions of LL problems.…”
Section: Related Workmentioning
confidence: 99%
“…C URRENTLY, a number of important machine learning and deep learning tasks can be captured by hierarchical models, such as hyper-parameter optimization [1], [2], [3], [4], neural architecture search [5], [6], [7], meta learning [8], [9], [10], Generative Adversarial Networks (GAN) [11], [12], reinforcement learning [13], image processing [14], [15], [16], [17], and so on. In general, these hierarchical models can be formulated as the following Bi-Level Optimization (BLO) problem [18], [19], [20]: " min x∈X " F (x, y), s.t. y ∈ S(x) := arg min y f (x, y), (1) where x ∈ X is the Upper-Level (UL) variable, y ∈ R n is the Lower-Level (LL) variable, the UL objective F (x, y) : X ×R n → R and the LL objective f (x, y) : R m ×R n → R, are continuously differentiable and jointly continuous functions, and the UL constraint X ⊂ R m is a compact set.…”
Section: Introductionmentioning
confidence: 99%
“…In the past decade, researchers have discovered numerous applications of bi-level programming in machine learning, including meta-learning (ML) [9], adversarial learning [16], hyperparameter optimization [23] and neural architecture search [19]. These newly found bilevel programs in ML are often solved by descent methods, which require differentiating through the (usually unconstrained) lower-level optimization problem [20]. The differentiation can be carried out either implicitly on the optimality conditions as in the conventional sensitivity analysis [see e.g., 1, 32, 3], or explicitly by unrolling the numerical procedure used to solve the lower-level problem [see e.g., 23,10].…”
Section: Related Workmentioning
confidence: 99%
“…First introduced in the field of economic game theory by Stackelberg (1934), this problem has recently received increasing attention in the machine learning community (Domke, 2012;Gould et al, 2016;Liao et al, 2018;Blondel et al, 2021;Liu et al, 2021;Shaban et al, 2019). Indeed, many machine learning applications can be reduced to (1) including hyper-parameter optimization (Feurer and Hutter, 2019), meta-learning (Bertinetto et al, 2018), reinforcement learning (Hong et al, 2020b;Liu et al, 2021) or dictionary learning (Mairal et al, 2011;Lecouat et al, 2020a;b).…”
Section: Introductionmentioning
confidence: 99%