Self-supervised Visual Reinforcement Learning with Object-centric Representations

Zadaianchuk, Andrii; Seitzer, Maximilian; Martius, Georg

doi:10.48550/arxiv.2011.14381

Cited by 2 publications

(2 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If such benefits can be extended across agents, fair comparisons in RL should require listing the amount of compute resources used for novel methods. This is especially important for compute-intensive unsupervised methods [1,38,43,58,70,72] or model-based learning [4,12,20,27,41]. There are two common approaches for ensuring fair comparisons: using a standard architecture across algorithms or listing the amount of compute/memory consumption and compare methods on this basis.…”

Section: Discussionmentioning

confidence: 99%

Towards Deeper Deep Reinforcement Learning with Spectral Normalization

Björck¹,

Gomes²,

Weinberger³

2021

Preprint

View full text Add to dashboard Cite

In computer vision and natural language processing, innovations in model architecture that lead to increases in model capacity have reliably translated into gains in performance. In stark contrast with this trend, state-of-the-art reinforcement learning (RL) algorithms often use only small MLPs, and gains in performance typically originate from algorithmic innovations. It is natural to hypothesize that small datasets in RL necessitate simple models to avoid overfitting; however, this hypothesis is untested. In this paper we investigate how RL agents are affected by exchanging the small MLPs with larger modern networks with skip connections and normalization, focusing specifically on soft actor-critic (SAC) algorithms. We verify, empirically, that naïvely adopting such architectures leads to instabilities and poor performance, likely contributing to the popularity of simple models in practice. However, we show that dataset size is not the limiting factor, and instead argue that intrinsic instability from the actor in SAC taking gradients through the critic is the culprit. We demonstrate that a simple smoothing method can mitigate this issue, which enables stable training with large modern architectures. After smoothing, larger models yield dramatic performance improvements for state-ofthe-art agents -suggesting that more "easy" gains may be had by focusing on model architectures in addition to algorithmic innovations.Preprint. Under review.

show abstract

Section: Discussionmentioning

confidence: 99%

Towards Deeper Deep Reinforcement Learning with Spectral Normalization

Björck¹,

Gomes²,

Weinberger³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…However, collecting real interaction trajectories is very timeconsuming, while physics parameters in real situations can only be approximated, making the model-based methods hard to apply. A model-free method like RL can be used to get actions directly from the ground truth state (Peng et al, 2018), or raw pixel images (Zadaianchuk et al, 2020). However, generalization of the model to different manipulation objects is hard for ground truth input, while extracting useful features such as object shape, size, and the robot's relative position efficiently from raw images for a subsequent policy network is always tricky.…”

Section: Planar Object Pushingmentioning

confidence: 99%

Reinforcement Learning With Vision-Proprioception Model for Robot Planar Pushing

et al. 2022

View full text Add to dashboard Cite

We propose a vision-proprioception model for planar object pushing, efficiently integrating all necessary information from the environment. A Variational Autoencoder (VAE) is used to extract compact representations from the task-relevant part of the image. With the real-time robot state obtained easily from the hardware system, we fuse the latent representations from the VAE and the robot end-effector position together as the state of a Markov Decision Process. We use Soft Actor-Critic to train the robot to push different objects from random initial poses to target positions in simulation. Hindsight Experience replay is applied during the training process to improve the sample efficiency. Experiments demonstrate that our algorithm achieves a pushing performance superior to a state-based baseline model that cannot be generalized to a different object and outperforms state-of-the-art policies which operate on raw image observations. At last, we verify that our trained model has a good generalization ability to unseen objects in the real world.

show abstract

Self-supervised Visual Reinforcement Learning with Object-centric Representations

Cited by 2 publications

References 11 publications

Towards Deeper Deep Reinforcement Learning with Spectral Normalization

Towards Deeper Deep Reinforcement Learning with Spectral Normalization

Reinforcement Learning With Vision-Proprioception Model for Robot Planar Pushing

Contact Info

Product

Resources

About