2019
DOI: 10.1609/aaai.v33i01.33017530
|View full text |Cite
|
Sign up to set email alerts
|

Deep Reactive Policies for Planning in Stochastic Nonlinear Domains

Abstract: Recent advances in applying deep learning to planning have shown that Deep Reactive Policies (DRPs) can be powerful for fast decision-making in complex environments. However, an important limitation of current DRP-based approaches is either the need of optimal planners to be used as ground truth in a supervised learning setting or the sample complexity of high-variance policy gradient estimators, which are particularly troublesome in continuous state-action domains. In order to overcome those limitations, we i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
41
0
7

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 22 publications
(48 citation statements)
references
References 14 publications
0
41
0
7
Order By: Relevance
“…We focused on continuous stochastic domains with concurrent actions and exogenous events exhibiting nonlinear transition and cost functions. We presented the results published in (Bueno et al, 2019) and showed that training large DRPs with hundred of thousands of continuous action parameters can be carried out within minutes without the need of high-performance hardware. Finally, comparing the DRPs trained by our approach with online state-of-the-art gradient-based planners, we observed a speedup of several orders of magnitude on the time to select actions, which highlights the potential of DRPs for fast decision-making in continuous domains.…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…We focused on continuous stochastic domains with concurrent actions and exogenous events exhibiting nonlinear transition and cost functions. We presented the results published in (Bueno et al, 2019) and showed that training large DRPs with hundred of thousands of continuous action parameters can be carried out within minutes without the need of high-performance hardware. Finally, comparing the DRPs trained by our approach with online state-of-the-art gradient-based planners, we observed a speedup of several orders of magnitude on the time to select actions, which highlights the potential of DRPs for fast decision-making in continuous domains.…”
Section: Discussionmentioning
confidence: 99%
“…Examples of methods that approximate the original nonlinear planning problem through discretization include Monte-Carlo Tree Search (MCTS) (Chang et al, 2005;Kocsis and Szepesvári, 2006), numeric planning, and Q-learning (Watkins and Dayan, 1992), as well as approaches that resort to first and/or second order approximations such as Symbolic Dynamic Programming (SDP) (Sanner et al, 2011;Vianna et al, 2015;Zamani et al, 2012), MILP-based planning (Say, 2021;Say et al, 2017), and differentiable dynamic programming (DDP) (Jacobson and Mayne, 1970) iLQG (Li and Todorov, 2004) from the optimal control literature. In contrast, methods that are general enough to avoid the need to approximate the original nonlinear problem at the cost of settling for approximate solutions include differentiable planning (i.e., methods based on planning through backpropagation such as TensorPlan (Wu et al, 2017) planner for optimizing plans and Deep Reactive Policies (Bueno et al, 2019) for learning policies parametrized as neural networks in continuous stochastic problems) and model-free policy gradients (Sutton et al, 1999;Williams and Zipser, 1995). Figure 1.2 classifies these approaches according the nature of approximations leveraged in each method.…”
Section: Problem Approximations and Approximate Solutionsmentioning
confidence: 99%
See 3 more Smart Citations