2022
DOI: 10.48550/arxiv.2202.03957
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning

Abstract: We propose a new policy parameterization for representing 3D rotations during reinforcement learning. Today in the continuous control reinforcement learning literature, many stochastic policy parameterizations are Gaussian. We argue that universally applying a Gaussian policy parameterization is not always desirable for all environments. One such case in particular where this is true are tasks that involve predicting a 3D rotation output, either in isolation, or coupled with translation as part of a full 6D po… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(10 citation statements)
references
References 28 publications
(49 reference statements)
0
10
0
Order By: Relevance
“…In our case, orientation is represented by unit quaternions. Figure 3 shows the results of learning the orientation represented as a unit quaternion using Gaussian Policy Parameterization (GPP), Tangent Space Gaussian Policy Parameterization (TSGPP), and Bingham Policy Parameterization (BPP) [34]. The quality of the learned policy using TSGPP was better than GPP for both SAC [21] and PPO [22], while compared to BPP a slightly better policy was learned for SAC and a comparable policy was learned for PPO.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…In our case, orientation is represented by unit quaternions. Figure 3 shows the results of learning the orientation represented as a unit quaternion using Gaussian Policy Parameterization (GPP), Tangent Space Gaussian Policy Parameterization (TSGPP), and Bingham Policy Parameterization (BPP) [34]. The quality of the learned policy using TSGPP was better than GPP for both SAC [21] and PPO [22], while compared to BPP a slightly better policy was learned for SAC and a comparable policy was learned for PPO.…”
Section: Resultsmentioning
confidence: 99%
“…As already noted in [34], BPP parameterization relies on the prediction from multiple neural networks and this may introduce significant approximation errors. Unlike GPP and TSGPP, this culminates in an unstable learning process.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…However, this is not the first work to propose extensions to ARM. ARM [12], has recently been extended to a Bingham policy parameterization [18] to improve training stability. Another extension has sought to improve the control agent within coarse-tofine ARM to use learned path ranking [19] to overcome the weaknesses of traditional path planning.…”
Section: Related Workmentioning
confidence: 99%