Group Equivariant Deep Reinforcement Learning

Mondal, Arnab Kumar; Nair, Pratheeksha; Siddiqi, Kaleem

doi:10.48550/arxiv.2007.03437

Cited by 4 publications

(5 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, we restrict ourselves to a model-free approach where the model-based machinery presented in [ 18 ] does not apply. Finally, we mention that there is some literature on exploiting equivalence in deep RL (e.g., [ 21 , 22 ]). However, none of these works study provably efficient learning methods to our best knowledge.…”

Section: Related Workmentioning

confidence: 99%

“…As a result, the learning performance would depend on the effective size of the state space (or the effective number of unknown parameters). Various notions of structures have been studied in MDPs, which include the Lipschitz continuity of MDP parameters (e.g., rewards and transition functions) [ 10 , 11 , 12 , 13 ], factorization structure [ 14 , 15 , 16 ], and equivalence relations [ 17 , 18 , 19 , 20 , 21 , 22 ]. These works reveal that exploiting the underlying structure in the environment in various RL tasks leads to massive empirical performance gain (over structure-oblivious algorithms) and to significantly improved performance bounds.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Scaling Up Q-Learning via Exploiting State–Action Equivalence

Lyu

Côme

Zhang

et al. 2023

Entropy

View full text Add to dashboard Cite

Recent success stories in reinforcement learning have demonstrated that leveraging structural properties of the underlying environment is key in devising viable methods capable of solving complex tasks. We study off-policy learning in discounted reinforcement learning, where some equivalence relation in the environment exists. We introduce a new model-free algorithm, called QL-ES (Q-learning with equivalence structure), which is a variant of (asynchronous) Q-learning tailored to exploit the equivalence structure in the MDP. We report a non-asymptotic PAC-type sample complexity bound for QL-ES, thereby establishing its sample efficiency. This bound also allows us to quantify the superiority of QL-ES over Q-learning analytically, which shows that the theoretical gain in some domains can be massive. We report extensive numerical experiments demonstrating that QL-ES converges significantly faster than (structure-oblivious) Q-learning empirically. They imply that the empirical performance gain obtained by exploiting the equivalence structure could be massive, even in simple domains. To the best of our knowledge, QL-ES is the first provably efficient model-free algorithm to exploit the equivalence structure in finite MDPs.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Scaling Up Q-Learning via Exploiting State–Action Equivalence

Lyu

Côme

Zhang

et al. 2023

Entropy

View full text Add to dashboard Cite

show abstract

“…The efficacy of equivariant deep learning has been explored in the field of medical imaging [21], object detection [22], aircraft detection [23] and reinforcement learning [24]. To the best of our knowledge, this concept has not been applied to feature description to obtain descriptors that are rotation equivariant.…”

Section: B Rotation Equivariance Networkmentioning

confidence: 99%

ReF -- Rotation Equivariant Features for Local Feature Matching

Peri¹,

Mehta²,

Mishra³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In reinforcement learning, some recent work applies equivariant models to structure-finding problems involving MDP homomorphisms [15,16]. In addition, Mondal et al [17] recently applied an E(2)-equivariant model to Q learning in an Atari game domain, but showed limited improvement. To our knowledge, equivariant model architectures have not been explored in the context of robotics applications.…”

Section: Related Workmentioning

confidence: 99%

Equivariant $Q$ Learning in Spatial Action Spaces

Wang¹,

Walters²,

Zhu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recently, a variety of new equivariant neural network model architectures have been proposed that generalize better over rotational and reflectional symmetries than standard models. These models are relevant to robotics because many robotics problems can be expressed in a rotationally symmetric way. This paper focuses on equivariance over a visual state space and a spatial action space -the setting where the robot action space includes a subset of SE(2). In this situation, we know a priori that rotations and translations in the state image should result in the same rotations and translations in the spatial action dimensions of the optimal policy. Therefore, we can use equivariant model architectures to make Q learning more sample efficient. This paper identifies when the optimal Q function is equivariant and proposes Q network architectures for this setting. We show experimentally that this approach outperforms standard methods in a set of challenging manipulation problems.

show abstract

Group Equivariant Deep Reinforcement Learning

Cited by 4 publications

References 0 publications

Scaling Up Q-Learning via Exploiting State–Action Equivalence

Scaling Up Q-Learning via Exploiting State–Action Equivalence

ReF -- Rotation Equivariant Features for Local Feature Matching

Equivariant $Q$ Learning in Spatial Action Spaces

Contact Info

Product

Resources

About