CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning

Colas, Cédric; Fournier, Pierre; Sigaud, Olivier; Chétouani, Mohamed; Oudeyer, Pierre-Yves

doi:10.48550/arxiv.1810.06284

Cited by 11 publications

(17 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, some prior work considers exploration in the space of goals [Colas et al, 2018, Held et al, 2017, Nair et al, 2018, Pong et al, 2019. In Appendix D.3, we also discuss how goal-conditioned RL [Kaelbling, 1993, Schaul et al, 2015 can be viewed as a special case of State Marginal Matching when the goal-sampling distribution is learned jointly with the policy.…”

Section: Related Workmentioning

confidence: 99%

Efficient Exploration via State Marginal Matching

Lee,

Eysenbach,

Parisotto

et al. 2019

Preprint

View full text Add to dashboard Cite

To solve tasks with sparse rewards, reinforcement learning algorithms must be equipped with suitable exploration techniques. However, it is unclear what underlying objective is being optimized by existing exploration algorithms, or how they can be altered to incorporate prior knowledge about the task. Most importantly, it is difficult to use exploration experience from one task to acquire exploration strategies for another task. We address these shortcomings by learning a single exploration policy that can quickly solve a suite of downstream tasks in a multi-task setting, amortizing the cost of learning to explore. We recast exploration as a problem of State Marginal Matching (SMM): we learn a mixture of policies for which the state marginal distribution matches a given target state distribution, which can incorporate prior knowledge about the task. Without any prior knowledge, the SMM objective reduces to maximizing the marginal state entropy. We optimize the objective by reducing it to a two-player, zero-sum game, where we iteratively fit a state density model and then update the policy to visit states with low density under this model. While many previous algorithms for exploration employ a similar procedure, they omit a crucial historical averaging step, without which the iterative procedure does not converge to a Nash equilibria. To parallelize exploration, we extend our algorithm to use mixtures of policies, wherein we discover connections between SMM and previously-proposed skill learning methods based on mutual information. On complex navigation and manipulation tasks, we demonstrate that our algorithm explores faster and adapts more quickly to new tasks. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Efficient Exploration via State Marginal Matching

Lee,

Eysenbach,

Parisotto

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Empowerment and reward-free RL. Task-agnostic reward functions have been proposed to encourage exploration in environments using notions of curiosity or novelty (Schmidhuber, 1991;Oudeyer & Kaplan, 2009;Schmidhuber, 2010;Bellemare et al, 2016;Pathak et al, 2017;Colas et al, 2018). In a similar vein, some methods maximize the state-visitation entropy (Hazan et al, 2018;Pong et al, 2019;Lee et al, 2019;Ghasemipour et al, 2019).…”

Section: Related Workmentioning

confidence: 99%

Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning

Choi¹,

Sharma²,

Lee³

et al. 2021

Preprint

View full text Add to dashboard Cite

Learning to reach goal states and learning diverse skills through mutual information (MI) maximization have been proposed as principled frameworks for self-supervised reinforcement learning, allowing agents to acquire broadly applicable multitask policies with minimal reward engineering. Starting from a simple observation that the standard goal-conditioned RL (GCRL) is encapsulated by the optimization objective of variational empowerment, we discuss how GCRL and MIbased RL can be generalized into a single family of methods, which we name variational GCRL (VGCRL), interpreting variational MI maximization, or variational empowerment, as representation learning methods that acquire functionallyaware state representations for goal reaching. This novel perspective allows us to: (1) derive simple but unexplored variants of GCRL to study how adding small representation capacity can already expand its capabilities; (2) investigate how discriminator function capacity and smoothness determine the quality of discovered skills, or latent goals, through modifying latent dimensionality and applying spectral normalization; (3) adapt techniques such as hindsight experience replay (HER) from GCRL to MI-based RL; and lastly, (4) propose a novel evaluation metric, named latent goal reaching (LGR), for comparing empowerment algorithms with different choices of latent dimensionality and discriminator parameterization. Through principled mathematical derivations and careful experimental studies, our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.

show abstract

“…The current state is the concatenation of the current observations and the difference between the current observations and the initial observations s t = o t − ∆o t = o t − (o t − o 0 ) We consider fixed-length episodes, where the agent sets its own goal for the whole episode. Our agent belongs to the framework of IMGEP [25] and is based on the CURIOUS [22] algorithm. In the same spirit it sets its own goals and uses intrinsic motivations to guide its learning trajectory.…”

Section: Language Enhanced Explorermentioning

confidence: 99%

Language Grounding through Social Interactions and Curiosity-Driven Multi-Goal Learning

Lair,

Colas,

Portelas

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Autonomous reinforcement learning agents, like children, do not have access to predefined goals and reward functions. They must discover potential goals, learn their own reward functions and engage in their own learning trajectory. Children, however, benefit from exposure to language, helping to organize and mediate their thought. We propose LE2 (Language Enhanced Exploration), a learning algorithm leveraging intrinsic motivations and natural language (NL) interactions with a descriptive social partner (SP). Using NL descriptions from the SP, it can learn an NL-conditioned reward function to formulate goals for intrinsically motivated goal exploration and learn a goal-conditioned policy. By exploring, collecting descriptions from the SP and jointly learning the reward function and the policy, the agent grounds NL descriptions into real behavioral goals. From simple goals discovered early to more complex goals discovered by experimenting on simpler ones, our agent autonomously builds its own behavioral repertoire. This naturally occurring curriculum is supplemented by an active learning curriculum resulting from the agent's intrinsic motivations. Experiments are presented with a simulated robotic arm that interacts with several objects including tools.

show abstract

CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning

Cited by 11 publications

References 25 publications

Efficient Exploration via State Marginal Matching

Efficient Exploration via State Marginal Matching

Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning

Language Grounding through Social Interactions and Curiosity-Driven Multi-Goal Learning

Contact Info

Product

Resources

About