ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054474
|View full text |Cite
|
Sign up to set email alerts
|

Solving Non-Convex Non-Differentiable Min-Max Games Using Proximal Gradient Method

Abstract: Min-max saddle point games appear in a wide range of applications in machine leaning and signal processing. Despite their wide applicability, theoretical studies are mostly limited to the special convex-concave structure. While some recent works generalized these results to special smooth non-convex cases, our understanding of non-smooth scenarios is still limited. In this work, we study special form of non-smooth min-max games when the objective function is (strongly) convex with respect to one of the player'… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(19 citation statements)
references
References 26 publications
0
19
0
Order By: Relevance
“…At the early stage of learning starting with random Q-network weight initialization and random policy-network weight initialization, there is little Q-value difference with respect to either state or action, as seen in Figs. 2(b) and 2(d), so the entropy term is dominant in the policy update (6) and the policy entropy increases with the policy distribution approaching the uniform distribution, as seen in Fig. 2(c).…”
Section: Saturationmentioning
confidence: 93%
See 3 more Smart Citations
“…At the early stage of learning starting with random Q-network weight initialization and random policy-network weight initialization, there is little Q-value difference with respect to either state or action, as seen in Figs. 2(b) and 2(d), so the entropy term is dominant in the policy update (6) and the policy entropy increases with the policy distribution approaching the uniform distribution, as seen in Fig. 2(c).…”
Section: Saturationmentioning
confidence: 93%
“…Indeed, it is seen in Fig. 1(c) that (6) does not affect the policy update, only the second term H(π(•|s t )) works, and thus the policy update yields π to converge to the uniform policy for every state maximizing the total entropy. Hence, the exploration radius in the case of α Q = 0 is almost the same as that of the uniform policy, as seen in Fig.…”
Section: Saturationmentioning
confidence: 98%
See 2 more Smart Citations
“…However, computing a Nash equilibrium point is NP-hard in general [22,19], and it may not even exist [23]. As a result, since we are considering the general non-convexnon-concave regime, we settle in computing a first-order Nash equilibrium point [24,25] defined next.…”
Section: Formulation Of the Min-max Optimization Problemmentioning
confidence: 99%