Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation

James, S. Jill; Wada, Kazumi; Laidlow, Tristan; Davison, Andrew J.

doi:10.48550/arxiv.2106.12534

Cited by 1 publication

(2 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Continuous control RL policies are commonly parameterized as Gaussians with diagonal covariance matrices [31,30,11], though other parameterizations have been considered, including Gaussians with covariance matrix via the Cholesky factor [1], Gaussian mixtures [38], Beta distributions [3], and Bernoulli distributions [32]. Rather than directly outputting continuous values, an alternative way of parameterizing a continuous control policy is via discretization, whether through growing action spaces [4] or coarse-to-fine networks [16]. All of these works share a common goal of moving away from the conventional Gaussian parameterization, but none are ideal when faced with an action space that requires rotation predictions.…”

Section: Related Workmentioning

confidence: 99%

“…Deep reinforcement learning (RL) is now actively used in many areas, including playing games [25,33], robot manipulation [24,16], and legged robotics [13,29]. The leading (general-purpose) algorithms within the continuous control RL community are either deterministic, such as DDPG [23] and TD3 [5], or stochastic, such as SAC [11] and PPO [31].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning

James¹,

Abbeel²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

We propose a new policy parameterization for representing 3D rotations during reinforcement learning. Today in the continuous control reinforcement learning literature, many stochastic policy parameterizations are Gaussian. We argue that universally applying a Gaussian policy parameterization is not always desirable for all environments. One such case in particular where this is true are tasks that involve predicting a 3D rotation output, either in isolation, or coupled with translation as part of a full 6D pose output. Our proposed Bingham Policy Parameterization (BPP) models the Bingham distribution and allows for better rotation (quaternion) prediction over a Gaussian policy parameterization in a range of reinforcement learning tasks. We evaluate BPP on the rotation Wahba problem task, as well as a set of vision-based next-best pose robot manipulation tasks from RLBench. We hope that this paper encourages more research into developing other policy parameterization that are more suited for particular environments, rather than always assuming Gaussian.

show abstract