2021
DOI: 10.48550/arxiv.2102.04376
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Adversarially Guided Actor-Critic

Yannis Flet-Berliac,
Johan Ferret,
Olivier Pietquin
et al.

Abstract: Despite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck. These methods consider a policy (the actor) and a value function (the critic) whose respective losses are built using different motivations and approaches. This paper introduces a third protagonist: the adversary. While the adversary mimics the actor by minimizing the KL-divergence betw… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 14 publications
0
2
0
Order By: Relevance
“…Such algorithms can be split into two main sets and can be distinguished by whether the actions (defined by numbers) taken by the agent are discrete or continuous. Algorithms such as Deep Q-Learning [31] or Actor-Critic methods [32] use a discrete action space (convenient when one can take only a finite amount of actions), while algorithms such as the soft Actor-Critic method [22] and the Deep Deterministic Policy Gradient method [33] were developed for when the actions can take any real value.…”
Section: Continuous Action Space Reinforcement Learningmentioning
confidence: 99%
“…Such algorithms can be split into two main sets and can be distinguished by whether the actions (defined by numbers) taken by the agent are discrete or continuous. Algorithms such as Deep Q-Learning [31] or Actor-Critic methods [32] use a discrete action space (convenient when one can take only a finite amount of actions), while algorithms such as the soft Actor-Critic method [22] and the Deep Deterministic Policy Gradient method [33] were developed for when the actions can take any real value.…”
Section: Continuous Action Space Reinforcement Learningmentioning
confidence: 99%
“…Emergence in multi-agent setting: Multi-agent competition can provide a mechanism for driving RL agents to automatically learn increasingly complex behavior [27]. As each agent adapts, it makes the learning problem for the other agent increasingly difficult, leading to the emergence of an automatic curriculum of challenging learning tasks [2,10,47,14]. For example, Schmidhuber [38] proposed having two classifiers compete by repeatedly selecting examples which they can classify but which the other cannot.…”
Section: Related Workmentioning
confidence: 99%