2020
DOI: 10.1016/j.asr.2019.12.030
|View full text |Cite
|
Sign up to set email alerts
|

Deep reinforcement learning for six degree-of-freedom planetary landing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
68
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 154 publications
(84 citation statements)
references
References 38 publications
0
68
0
1
Order By: Relevance
“…Specifically, the trained policy's hidden state captures unobserved (potentially time-varying) information such as external forces that are useful in minimizing the cost function. In contrast, a non-recurrent policy (which we will refer to as an MLP policy), which does not maintain a persistent hidden state vector, can only optimize using a set of current observations, actions, and advantages, and will tend to under-perform a recurrent policy on tasks with randomized dynamics, although as we have shown in [19], training with parameter uncertainty can give good results using an MLP policy, provided the parameter uncertainty is not too extreme. After training, although the recurrent policy's network weights are frozen, the hidden state will continue to evolve in response to a sequence of observations and actions, thus making the policy adaptive.…”
Section: Robustnessmentioning
confidence: 99%
“…Specifically, the trained policy's hidden state captures unobserved (potentially time-varying) information such as external forces that are useful in minimizing the cost function. In contrast, a non-recurrent policy (which we will refer to as an MLP policy), which does not maintain a persistent hidden state vector, can only optimize using a set of current observations, actions, and advantages, and will tend to under-perform a recurrent policy on tasks with randomized dynamics, although as we have shown in [19], training with parameter uncertainty can give good results using an MLP policy, provided the parameter uncertainty is not too extreme. After training, although the recurrent policy's network weights are frozen, the hidden state will continue to evolve in response to a sequence of observations and actions, thus making the policy adaptive.…”
Section: Robustnessmentioning
confidence: 99%
“…Reinforcement Learning (RL) has recently been successfully applied to landing guidance problems. [9][10][11][12] Importantly, the observations are chosen such that the policy generalizes well to different landing sites. Specifically, the policy can be optimized for a specific landing site, and when deployed can be used for an arbitrary landing site.…”
Section: Initial Conditionsmentioning
confidence: 99%
“…In our previous work with 6-DOF Mars powered descent phase the policy took less than 1mS to run the mapping between estimated state and thruster commands (four small matrix multiplications) on a 3Ghz processor. 12 Since in this work the mapping is updated every six seconds, we do not see any issues with running this on the current generation of space-certified flight computers. A diagram illustrating how the policy interfaces with peripheral spacecraft components is shown in Fig.…”
Section: Initial Conditionsmentioning
confidence: 99%
See 1 more Smart Citation
“…In this proposal, we use a different approach [11] using a recurrent policy and value function. Note that it is possible to train over a wide range of POMDPs using a non-meta RL algorithm [18]. Although such an approach typically results in a robust policy, the policy cannot adapt in real time to novel environments.…”
Section: A Rl Overviewmentioning
confidence: 99%