ASNets: Deep Learning for Generalised Planning

Toyer, Sam; Thiébaux, Sylvie; Trevizan, Felipe W.; Xie, Lexing

doi:10.1613/jair.1.11633

Cited by 37 publications

(56 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The use of numerical features that can be incremented and decremented qualitatively is inspired by QNPs (Srivastava et al 2011;Bonet and Geffner 2020). Other works aimed at learning generalized policies or plans include planning programs (Segovia, Jiménez, and Jonsson 2016), logical programs (Silver et al 2020), and deep learning approaches (Groshev et al 2018;Bajpai, Garg, and Mausam 2018;Toyer et al 2020), some of which have been used to learn heuristics (Shen, Trevizan, and Thiébaux 2020;Karia and Srivastava 2021).…”

Section: Related Workmentioning

confidence: 99%

Learning Sketches for Decomposing Planning Problems into Subproblems of Bounded Width: Extended Version

Drexler¹,

Seipp²,

Geffner³

2022

Preprint

View full text Add to dashboard Cite

Recently, sketches have been introduced as a general language for representing the subgoal structure of instances drawn from the same domain. Sketches are collections of rules of the form C → E over a given set of features where C expresses Boolean conditions and E expresses qualitative changes. Each sketch rule defines a subproblem: going from a state that satisfies C to a state that achieves the change expressed by E or a goal state. Sketches can encode simple goal serializations, general policies, or decompositions of bounded width that can be solved greedily, in polynomial time, by the SIWR variant of the SIW algorithm. Previous work has shown the computational value of sketches over benchmark domains that, while tractable, are challenging for domain-independent planners. In this work, we address the problem of learning sketches automatically given a planning domain, some instances of the target class of problems, and the desired bound on the sketch width. We present a logical formulation of the problem, an implementation using the ASP solver Clingo, and experimental results. The sketch learner and the SIWR planner yield a domain-independent planner that learns and exploits domain structure in a crisp and explicit form.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning Sketches for Decomposing Planning Problems into Subproblems of Bounded Width: Extended Version

Drexler¹,

Seipp²,

Geffner³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In the past decade, deep learning (DL) methods have demonstrated remarkable success in a variety of complex applications in computer vision, natural language, and signal processing (Krizhevsky, Sutskever, and Hinton 2017;Hinton et al 2012;Bengio, Lecun, and Hinton 2021). More recently, a variety of work has sought to leverage DL tools for planning and policy learning in a large variety of deterministic and stochastic decision-making domains (Wu, Say, and Sanner 2017;Wu, Say, and Sanner 2020;Say et al 2020;Scaroni et al 2020;Say 2021;Toyer et al 2020;Garg, Bajpai, and Mausam 2020).…”

Section: Introductionmentioning

confidence: 99%

“…All rights reserved. domain instantiations of these relational models (Groshev et al 2018;Toyer et al 2018;Bajpai, Garg, and Mausam 2018;Mausam 2019, 2020;Toyer et al 2020). Other recent work has investigated planning by discrete and mixed integer optimization in learned discrete neural network models of planning domains (Say and Sanner 2018;Say et al 2020).…”

Section: Introductionmentioning

confidence: 99%

Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

Low¹,

Kumar²,

Sanner³

2022

Preprint

View full text Add to dashboard Cite

Recent advances in deep learning have enabled optimization of deep reactive policies (DRPs) for continuous MDP planning by encoding a parametric policy as a deep neural network and exploiting automatic differentiation in an end-toend model-based gradient descent framework. This approach has proven effective for optimizing DRPs in nonlinear continuous MDPs, but it requires a large number of sampled trajectories to learn effectively and can suffer from high variance in solution quality. In this work, we revisit the overall model-based DRP objective and instead take a minorizationmaximization perspective to iteratively optimize the DRP w.r.t. a locally tight lower-bounded objective. This novel formulation of DRP learning as iterative lower bound optimization (ILBO) is particularly appealing because (i) each step is structurally easier to optimize than the overall objective, (ii) it guarantees a monotonically improving objective under certain theoretical conditions, and (iii) it reuses samples between iterations thus lowering sample complexity. Empirical evaluation confirms that ILBO is significantly more sampleefficient than the state-of-the-art DRP planner and consistently produces better solution quality with lower variance. We additionally demonstrate that ILBO generalizes well to new problem instances (i.e., different initial states) without requiring retraining.

show abstract

“…In this paper, we consider the problem of learning generalized policies for classical planning domains using graph neural networks (Scarselli et al, 2008;Hamilton, 2020) from small instances represented in lifted STRIPS. The problem has been considered before but using neural architectures that are more complex and with results that are often less crisp, involving in certain cases heuristic information or search (Toyer et al, 2020;Garg et al, 2020;Rivlin et al, 2020;Karia and Srivastava, 2021;Shen et al, 2020). We use a simple and general GNN architecture and aim at obtaining crisp experimental results and a deeper understanding: either the policy greedy in the learned value function achieves close to 100% generalization over instances larger than those used in training, or the failure must be understood and, possibly fixed, using logical methods.…”

Section: Introductionmentioning

confidence: 99%

Learning Generalized Policies Without Supervision Using GNNs

Ståhlberg¹,

Bonet²,

Geffner³

2022

Preprint

View full text Add to dashboard Cite

We consider the problem of learning generalized policies for classical planning domains using graph neural networks from small instances represented in lifted STRIPS. The problem has been considered before but the proposed neural architectures are complex and the results are often mixed. In this work, we use a simple and general GNN architecture and aim at obtaining crisp experimental results and a deeper understanding: either the policy greedy in the learned value function achieves close to 100% generalization over instances larger than those used in training, or the failure must be understood, and possibly fixed, logically. For this, we exploit the relation established between the expressive power of GNNs and the C2 fragment of first-order logic (namely, FOL with 2 variables and counting quantifiers). We find for example that domains with general policies that require more expressive features can be solved with GNNs once the states are extended with suitable "derived atoms" encoding role compositions and transitive closures that do not fit into C2. The work follows the GNN approach for learning optimal general policies in a supervised fashion (Ståhlberg, Bonet, and Geffner, 2022); but the learned policies are no longer required to be optimal (which expands the scope, as many planning domains do not have general optimal policies) and are learned without supervision. Interestingly, value-based reinforcement learning methods that aim to produce optimal policies, do not always yield policies that generalize, as the goals of optimality and generality are in conflict in domains where optimal planning is NP-hard.

show abstract

ASNets: Deep Learning for Generalised Planning

Cited by 37 publications

References 60 publications

Learning Sketches for Decomposing Planning Problems into Subproblems of Bounded Width: Extended Version

Learning Sketches for Decomposing Planning Problems into Subproblems of Bounded Width: Extended Version

Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

Learning Generalized Policies Without Supervision Using GNNs

Contact Info

Product

Resources

About