2018
DOI: 10.48550/arxiv.1810.06284
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning

Abstract: In open-ended environments, autonomous learning agents must set their own goals and build their own curriculum through an intrinsically motivated exploration. They may consider a large diversity of goals, aiming to discover what is controllable in their environments, and what is not. Because some goals might prove easy and some impossible, agents must actively select which goal to practice at any moment, to maximize their overall mastery on the set of learnable goals. This paper proposes CURIOUS, an algorithm … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 11 publications
(17 citation statements)
references
References 25 publications
0
17
0
Order By: Relevance
“…Finally, some prior work considers exploration in the space of goals [Colas et al, 2018, Held et al, 2017, Nair et al, 2018, Pong et al, 2019. In Appendix D.3, we also discuss how goal-conditioned RL [Kaelbling, 1993, Schaul et al, 2015 can be viewed as a special case of State Marginal Matching when the goal-sampling distribution is learned jointly with the policy.…”
Section: Related Workmentioning
confidence: 99%
“…Finally, some prior work considers exploration in the space of goals [Colas et al, 2018, Held et al, 2017, Nair et al, 2018, Pong et al, 2019. In Appendix D.3, we also discuss how goal-conditioned RL [Kaelbling, 1993, Schaul et al, 2015 can be viewed as a special case of State Marginal Matching when the goal-sampling distribution is learned jointly with the policy.…”
Section: Related Workmentioning
confidence: 99%
“…Empowerment and reward-free RL. Task-agnostic reward functions have been proposed to encourage exploration in environments using notions of curiosity or novelty (Schmidhuber, 1991;Oudeyer & Kaplan, 2009;Schmidhuber, 2010;Bellemare et al, 2016;Pathak et al, 2017;Colas et al, 2018). In a similar vein, some methods maximize the state-visitation entropy (Hazan et al, 2018;Pong et al, 2019;Lee et al, 2019;Ghasemipour et al, 2019).…”
Section: Related Workmentioning
confidence: 99%
“…The current state is the concatenation of the current observations and the difference between the current observations and the initial observations s t = o t − ∆o t = o t − (o t − o 0 ) We consider fixed-length episodes, where the agent sets its own goal for the whole episode. Our agent belongs to the framework of IMGEP [25] and is based on the CURIOUS [22] algorithm. In the same spirit it sets its own goals and uses intrinsic motivations to guide its learning trajectory.…”
Section: Language Enhanced Explorermentioning
confidence: 99%