“…Continuous control RL policies are commonly parameterized as Gaussians with diagonal covariance matrices [31,30,11], though other parameterizations have been considered, including Gaussians with covariance matrix via the Cholesky factor [1], Gaussian mixtures [38], Beta distributions [3], and Bernoulli distributions [32]. Rather than directly outputting continuous values, an alternative way of parameterizing a continuous control policy is via discretization, whether through growing action spaces [4] or coarse-to-fine networks [16]. All of these works share a common goal of moving away from the conventional Gaussian parameterization, but none are ideal when faced with an action space that requires rotation predictions.…”