Robotics: Science and Systems XV 2019
DOI: 10.15607/rss.2019.xv.020
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Visuomotor Control through Distributional Planning Networks

Abstract: While reinforcement learning (RL) has the potential to enable robots to autonomously acquire a wide range of skills, in practice, RL usually requires manual, per-task engineering of reward functions, especially in real world settings where aspects of the environment needed to compute progress are not directly accessible. To enable robots to autonomously learn skills, we instead consider the problem of reinforcement learning without access to rewards. We aim to learn an unsupervised embedding space under which … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(16 citation statements)
references
References 40 publications
0
16
0
Order By: Relevance
“…Several prior works [50,56,59,44] maximize MI objectives that closely resemble the forward information objective we introduce in Section 4, whiel others optimize related objectives by learning latent forward dynamics models [69,33,73,26,39]. Multi-step inverse models, closely related to the inverse information objective (Section 4), have been used to learn control-centric representations [70,23]. Single-step inverse models have been deployed as regularization of forward models [72,2] and as an auxiliary loss for policy gradient RL [57,52].…”
Section: Mutual Information Objectives In Rlmentioning
confidence: 99%
See 1 more Smart Citation
“…Several prior works [50,56,59,44] maximize MI objectives that closely resemble the forward information objective we introduce in Section 4, whiel others optimize related objectives by learning latent forward dynamics models [69,33,73,26,39]. Multi-step inverse models, closely related to the inverse information objective (Section 4), have been used to learn control-centric representations [70,23]. Single-step inverse models have been deployed as regularization of forward models [72,2] and as an auxiliary loss for policy gradient RL [57,52].…”
Section: Mutual Information Objectives In Rlmentioning
confidence: 99%
“…We suggestively name this objective "inverse information" due to the second term, which is the entropy of the inverse dynamics. A wide range of prior work learns representations by optimizing closely related objectives [23,57,2,52,70,72]. Intuitively, inverse models allow the representation to capture only the elements of the state that are necessary to predict the action, allowing the discard of potentially irrelevant information.…”
Section: Mutual Information For Representation Learning In Rlmentioning
confidence: 99%
“…Finally, there are efforts to learn more sophisticated cost functions over input images C θ (ô, o g ) [16,20,21,30,23]. For example, Nair et al [16] train a latent representation to focus on portions of the image that are different between the goal and the current image, and show that costs computed over these latents permit better control on one robot.…”
Section: Robot-aware Planning Costsmentioning
confidence: 99%
“…Some works use visual affordance as auxiliary to estimate and adjust the joint configuration of robots in different tasks [26], [27]. There are also works attempting to directly optimise action trajectories via raw input images [28], by learning embeddings without supervision [29], [30], [31]. However, the interaction among objects are not considered in these works.…”
Section: Related Workmentioning
confidence: 99%