Deep Reactive Policies for Planning in Stochastic Nonlinear Domains

Bueno, Thiago Pereira; Barros, Leliane Nunes de; Mauá, Denis Deratani; Sanner, Scott

doi:10.1609/aaai.v33i01.33017530

Cited by 22 publications

(48 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We focused on continuous stochastic domains with concurrent actions and exogenous events exhibiting nonlinear transition and cost functions. We presented the results published in (Bueno et al, 2019) and showed that training large DRPs with hundred of thousands of continuous action parameters can be carried out within minutes without the need of high-performance hardware. Finally, comparing the DRPs trained by our approach with online state-of-the-art gradient-based planners, we observed a speedup of several orders of magnitude on the time to select actions, which highlights the potential of DRPs for fast decision-making in continuous domains.…”

Section: Discussionmentioning

confidence: 99%

“…Examples of methods that approximate the original nonlinear planning problem through discretization include Monte-Carlo Tree Search (MCTS) (Chang et al, 2005;Kocsis and Szepesvári, 2006), numeric planning, and Q-learning (Watkins and Dayan, 1992), as well as approaches that resort to first and/or second order approximations such as Symbolic Dynamic Programming (SDP) (Sanner et al, 2011;Vianna et al, 2015;Zamani et al, 2012), MILP-based planning (Say, 2021;Say et al, 2017), and differentiable dynamic programming (DDP) (Jacobson and Mayne, 1970) iLQG (Li and Todorov, 2004) from the optimal control literature. In contrast, methods that are general enough to avoid the need to approximate the original nonlinear problem at the cost of settling for approximate solutions include differentiable planning (i.e., methods based on planning through backpropagation such as TensorPlan (Wu et al, 2017) planner for optimizing plans and Deep Reactive Policies (Bueno et al, 2019) for learning policies parametrized as neural networks in continuous stochastic problems) and model-free policy gradients (Sutton et al, 1999;Williams and Zipser, 1995). Figure 1.2 classifies these approaches according the nature of approximations leveraged in each method.…”

Section: Problem Approximations and Approximate Solutionsmentioning

confidence: 99%

“…In addition, we propose to extend planning through backpropagation to solve offline stochastic planning problems through Deep Reactive Policies trained with policy search (Bueno et al, 2019) over stochastic computation graphs whose stochastic nodes are amenable to reparametrization. Finally, we extend and combine these ideas to implement an online differentiable planner based on information relaxation and anticipatory sampling.…”

Section: Thesis Proposal and Contributionsmentioning

confidence: 99%

“…We conclude the chapter with a number of experiments and a preliminary investigation of the combination of deep reactive policies with online planning that will naturally lead to the developments that shall be presented in the next chapter. We remark that the exposition here is mostly based on our work (Bueno et al, 2019) with some extended discussion and additional experiments.…”

Section: 1mentioning

confidence: 99%

“…In cases where the policy is explicitly represented, the planning task can be typically solved by offline approaches, i.e., an agent uses all the available computational budget to reason and plan before executing any action in the system. Deep Reactive Policies (DRPs) (Bueno et al, 2019) can be learned for problems in which all probability distributions are amenable to be reparametrized thus allowing gradients to flow through the transition model (Chapter 5). We remark that training a DRP via planning through backpropagation is an offline method that requires a priori knowledge of a good policy representation and a fixed planning horizon which are hyperparameters that need to be defined in advance.…”

Section: Stochastic Online Planningmentioning

confidence: 99%

See 4 more Smart Citations

Planning in stochastic computation graphs: solving stochastic nonlinear problems with backpropagation

Bueno¹

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

Section: Problem Approximations and Approximate Solutionsmentioning

confidence: 99%

Section: Thesis Proposal and Contributionsmentioning

confidence: 99%

Section: 1mentioning

confidence: 99%

Section: Stochastic Online Planningmentioning

confidence: 99%

See 3 more Smart Citations

Planning in stochastic computation graphs: solving stochastic nonlinear problems with backpropagation

Bueno¹

View full text Add to dashboard Cite

The 2023 International Planning Competition

Taitler,

Alford,

Espasa

et al. 2024

AI Magazine

Self Cite

View full text Add to dashboard Cite

In this article, we present an overview of the 2023 International Planning Competition. It featured five distinct tracks designed to assess cutting‐edge methods and explore the frontiers of planning within these settings: the classical (deterministic) track, the numeric track, the Hierarchical Task Networks (HTN) track, the learning track, and the probabilistic and reinforcement learning track. Each of these tracks evaluated planning methodologies through one or more subtracks, with the goal of pushing the boundaries of current planner performance. To achieve this objective, the competition introduced a combination of well‐established challenges and entirely novel ones. Within this article, each track offers an exploration of its historical context, justifies its relevance within the planning landscape, discusses emerging domains and trends, elucidates the evaluation methodology, and ultimately presents the results.

show abstract