CLIC: Curriculum Learning and Imitation for Object Control in Nonrewarding Environments

Fournier, Pierre F.; Colas, Cédric; Chétouani, Mohamed; Sigaud, Olivier

doi:10.1109/tcds.2019.2933371

Cited by 14 publications

(14 citation statements)

References 35 publications

(37 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DisTop simultaneously learns skills, their goal representation, and which skill to train on. It contrasts with several methods that exclusively focus on selecting which skill to train on assuming a good goal representation is available [23,17,24,62,18]. They either select goals according to a curriculum defined with intermediate difficulty and the learning progress [47] or by imagining new language-based goals [18].…”

Section: Related Workmentioning

confidence: 99%

DisTop: Discovering a Topological representation to learn diverse and rewarding skills

Aubret¹,

Matignon²,

Hassas³

2021

Preprint

View full text Add to dashboard Cite

The optimal way for a deep reinforcement learning (DRL) agent to explore is to learn a set of skills that achieves a uniform distribution of states. Following this, we introduce DisTop, a new model that simultaneously learns diverse skills and focuses on improving rewarding skills. DisTop progressively builds a discrete topology of the environment using an unsupervised contrastive loss, a growing network and a goal-conditioned policy. Using this topology, a state-independent hierarchical policy can select where the agent has to keep discovering skills in the state space. In turn, the newly visited states allows an improved learnt representation and the learning loop continues. Our experiments emphasize that DisTop is agnostic to the ground state representation and that the agent can discover the topology of its environment whether the states are high-dimensional binary data, images, or proprioceptive inputs. We demonstrate that this paradigm is competitive on MuJoCo benchmarks with state-of-the-art algorithms on both single-task dense rewards and diverse skill discovery. By combining these two aspects, we show that DisTop achieves state-of-the-art performance in comparison with hierarchical reinforcement learning (HRL) when rewards are sparse. We believe DisTop opens new perspectives by showing that bottom-up skill discovery combined with representation learning can unlock the exploration challenge in DRL.Preprint. Under review.

show abstract

Section: Related Workmentioning

confidence: 99%

DisTop: Discovering a Topological representation to learn diverse and rewarding skills

Aubret¹,

Matignon²,

Hassas³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Such techniques are called active imitation learning or interactive learning, and echo the psychological descriptions of infants' selectivity in social partners and its link to their motivation to learn [40,41]. Active imitation learning has been implemented [42] where the agent learns when to imitate using intrinsic motivation for a hierarchical RL problem in a discrete setting. For continuous action, state and goal spaces, the SGIM-ACTS algorithm [38] uses intrinsic motivation to choose not only the kind of demonstrations, but also when to request for demonstrations and who to ask among several teachers.…”

Section: Active Imitation Learning (Social Guidance)mentioning

confidence: 99%

Intrinsically Motivated Open-Ended Multi-Task Learning Using Transfer Learning to Discover Task Hierarchy

et al. 2021

View full text Add to dashboard Cite

In open-ended continuous environments, robots need to learn multiple parameterised control tasks in hierarchical reinforcement learning. We hypothesise that the most complex tasks can be learned more easily by transferring knowledge from simpler tasks, and faster by adapting the complexity of the actions to the task. We propose a task-oriented representation of complex actions, called procedures, to learn online task relationships and unbounded sequences of action primitives to control the different observables of the environment. Combining both goal-babbling with imitation learning, and active learning with transfer of knowledge based on intrinsic motivation, our algorithm self-organises its learning process. It chooses at any given time a task to focus on; and what, how, when and from whom to transfer knowledge. We show with a simulation and a real industrial robot arm, in cross-task and cross-learner transfer settings, that task composition is key to tackle highly complex tasks. Task decomposition is also efficiently transferred across different embodied learners and by active imitation, where the robot requests just a small amount of demonstrations and the adequate type of information. The robot learns and exploits task dependencies so as to learn tasks of every complexity.

show abstract

“…Most of such autotelic agents are equipped with one or several goal spaces and rely on goalconditioned RL Colas et al (2020b) and automatic curriculum learning Portelas et al (2020) to learn to achieve those goals along an open-ended developmental trajectory. This endows them with the capability to decide which goals to target and learn about as a function of their current abilities Florensa et al (2018);Fournier et al (2019); Colas et al (2019); Racaniere et al (2019). Thus, by contrast to Interactive RL agents, autotelic agents provide a promising solution to the boundedness issue: if they explore an unbounded set of goals of increasing complexity, they may end up accounting for the open-ended development of children.…”

Section: Autonomous Reinforcement Learnersmentioning

confidence: 99%

“…Closer to our concerns, the autotelic clic agent imitates the behavior of other agents acting in the environment without a pedagogical stance Fournier et al (2019). An interesting feature of clic is that it relies on a curriculum learning mechanism to decide which goal to imitate from these agents depending on its current capabilities.…”

Section: Observational Learningmentioning

confidence: 99%

Towards Teachable Autotelic Agents

Sigaud¹,

Akakzia²,

Caselles-Dupré³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Autonomous discovery and direct instruction are two extreme sources of learning in children, but educational sciences have shown that intermediate approaches such as assisted discovery or guided play resulted in better acquisition of skills. When turning to Artificial Intelligence, the above dichotomy can be translated into the distinction between autonomous agents, which learn in isolation from their own signals, and interactive learning agents which can be taught by social partners but generally lack autonomy. In between should stand teachable autonomous agents: agents that learn from both internal and teaching signals to benefit from the higher efficiency of assisted discovery processes. Designing such agents could result in progress in two ways. First, very concretely, it would offer a way to non-expert users in the real world to drive the learning behavior of agents towards their expectations. Second, more fundamentally, it might be a key step to endow agents with the necessary capabilities to reach general intelligence. The purpose of this paper is to elucidate the key obstacles standing in the way towards the design of such agents. We proceed in four steps. First, we build on a seminal work of Bruner to extract relevant features of the assisted discovery processes happening between a child and a tutor. Second, we highlight how current research on intrinsically motivated agents is paving the way towards teachable and autonomous agents. In particular, we focus on autotelic agents, i.e. agents equipped with forms of intrinsic motivations that enable them to represent, self-generate and pursue their own goals. We argue that such autotelic capabilities from the learner side are key in the discovery process. Third, we adopt a social learning perspective on the interaction between a tutor and a learner to highlight some components that are currently missing to these agents before they can be taught by ordinary people using natural pedagogy. Finally, we provide a list of specific research questions that emerge from the perspective of extending these agents with assisted learning capabilities.

show abstract

CLIC: Curriculum Learning and Imitation for Object Control in Nonrewarding Environments

Cited by 14 publications

References 35 publications

DisTop: Discovering a Topological representation to learn diverse and rewarding skills

DisTop: Discovering a Topological representation to learn diverse and rewarding skills

Intrinsically Motivated Open-Ended Multi-Task Learning Using Transfer Learning to Discover Task Hierarchy

Towards Teachable Autotelic Agents

Contact Info

Product

Resources

About