Adversarially Guided Actor-Critic

Flet-Berliac, Yannis; Ferret, Johan; Pietquin, Olivier; Preux, Philippe; Geist, Matthieu

doi:10.48550/arxiv.2102.04376

Cited by 2 publications

(2 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such algorithms can be split into two main sets and can be distinguished by whether the actions (defined by numbers) taken by the agent are discrete or continuous. Algorithms such as Deep Q-Learning [31] or Actor-Critic methods [32] use a discrete action space (convenient when one can take only a finite amount of actions), while algorithms such as the soft Actor-Critic method [22] and the Deep Deterministic Policy Gradient method [33] were developed for when the actions can take any real value.…”

Section: Continuous Action Space Reinforcement Learningmentioning

confidence: 99%

Conformal Bootstrap with Reinforcement Learning

Kántor,

Niarchos,

Papageorgakis

2021

Preprint

View full text Add to dashboard Cite

We introduce the use of reinforcement-learning (RL) techniques to the conformal-bootstrap programme. We demonstrate that suitable soft Actor-Critic RL algorithms can perform efficient, relatively cheap high-dimensional searches in the space of scaling dimensions and OPE-squared coefficients that produce sensible results for tens of CFT data from a single crossing equation. In this paper we test this approach in well-known 2D CFTs, with particular focus on the Ising and tri-critical Ising models and the free compactified boson CFT. We present results of as high as 36-dimensional searches, whose sole input is the expected number of operators per spin in a truncation of the conformal-block decomposition of the crossing equations. Our study of 2D CFTs uses only the global so(2, 2) part of the conformal algebra, and our methods are equally applicable to higher-dimensional CFTs.When combined with other, already available, numerical and analytical methods, we expect our approach to yield an exciting new window into the non-perturbative structure of arbitrary (unitary or non-unitary) CFTs.

show abstract

Section: Continuous Action Space Reinforcement Learningmentioning

confidence: 99%

Conformal Bootstrap with Reinforcement Learning

Kántor,

Niarchos,

Papageorgakis

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Emergence in multi-agent setting: Multi-agent competition can provide a mechanism for driving RL agents to automatically learn increasingly complex behavior [27]. As each agent adapts, it makes the learning problem for the other agent increasingly difficult, leading to the emergence of an automatic curriculum of challenging learning tasks [2,10,47,14]. For example, Schmidhuber [38] proposed having two classifiers compete by repeatedly selecting examples which they can classify but which the other cannot.…”

Section: Related Workmentioning

confidence: 99%

Explore and Control with Adversarial Surprise

Fickinger¹,

Jaques²,

Parajuli³

et al. 2021

Preprint

View full text Add to dashboard Cite

Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. However, since designing rewards often requires substantial engineering effort, we are interested in the problem of learning without rewards, where agents must discover useful behaviors in the absence of task-specific incentives. Intrinsic motivation is a family of unsupervised RL techniques which develop general objectives for an RL agent to optimize that lead to better exploration or the discovery of skills. In this paper, we propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. The policies each take turns controlling the agent. The Explore policy maximizes entropy, putting the agent into surprising or unfamiliar situations. Then, the Control policy takes over and seeks to recover from those situations by minimizing entropy. The game harnesses the power of multi-agent competition to drive the agent to seek out increasingly surprising parts of the environment while learning to gain mastery over them. We show empirically that our method leads to the emergence of complex skills by exhibiting clear phase transitions. Furthermore, we show both theoretically -via a latent state space coverage argument-and empirically that our method has the potential to be applied to the exploration of stochastic, partiallyobserved environments. We show that Adversarial Surprise learns more complex behaviors, and explores more effectively than competitive baselines, outperforming intrinsic motivation methods based on active inference, novelty-seeking (Random Network Distillation (RND)), and multi-agent unsupervised RL (Asymmetric Self-Play (ASP)) in MiniGrid, Atari and VizDoom environments. * Equal contribution.Preprint. Under review.

show abstract

Adversarially Guided Actor-Critic

Cited by 2 publications

References 14 publications

Conformal Bootstrap with Reinforcement Learning

Conformal Bootstrap with Reinforcement Learning

Explore and Control with Adversarial Surprise

Contact Info

Product

Resources

About