Learning Methods to Generate Good Plans: Integrating HTN Learning and Reinforcement Learning

Hogg, Chad; Kuter, Ugur; Muñoz-Ávila, Héctor

doi:10.1609/aaai.v24i1.7571

Cited by 18 publications

(10 citation statements)

References 8 publications

(6 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While the subgoal structure of a domain is important for handcrafting effective hierarchical task networks, HTNs do not actually encode subgoal structures but general top-down solving strategies where (non-primitive) methods decompose into other methods (Erol et al, 1994;Nau et al, 1999;Georgievski & Aiello, 2015). Techniques for learning HTNs usually appeal to annotated traces that convey the intended decompositions (Hogg et al, 2008;Zhuo et al, 2009). Other methods for deriving hierarchical decompositions in planning include precondition relaxations (Sacerdoti, 1974) and causal graphs (Knoblock, 1994).…”

Section: Related Workmentioning

confidence: 99%

General Policies, Subgoal Structure, and Planning Width

Bonet,

Geffner

2024

jair

View full text Add to dashboard Cite

It has been observed that many classical planning domains with atomic goals can be solved by means of a simple polynomial exploration procedure, called IW, that runs in time exponential in the problem width, which in these cases is bounded and small. Yet, while the notion of width has become part of state-of-the-art planning algorithms such as BFWS, there is no good explanation for why so many benchmark domains have bounded width when atomic goals are considered. In this work, we address this question by relating bounded width with the existence of general optimal policies that in each planning instance are represented by tuples of atoms of bounded size. We also define the notions of (explicit) serializations and serialized width that have a broader scope, as many domains have a bounded serialized width but no bounded width. Such problems are solved nonoptimally in polynomial time by a variant of the Serialized IW algorithm. Finally, the language of general policies and the semantics of serializations are combined to yield a simple, meaningful, and expressive language for specifying serializations in compact form in the form of sketches, which can be used for encoding domain control knowledge by hand or for learning it from examples. Sketches express general problem decompositions in terms of subgoals, and terminating sketches of bounded width express problem decompositions that can be solved in polynomial time.

show abstract

Section: Related Workmentioning

confidence: 99%

General Policies, Subgoal Structure, and Planning Width

Bonet,

Geffner

2024

jair

View full text Add to dashboard Cite

show abstract

“…While the subgoal structure of a domain is important for handcrafting effective hierarchical task networks, HTNs do not actually encode subgoal structures but general top-down strategies where (non-primitive) methods decompose into other methods (Erol, Hendler, & Nau, 1994;Nau, Cao, Lotem, & Munoz-Avila, 1999;Georgievski & Aiello, 2015). Techniques for learning HTNs usually appeal to annotated traces that convey the intended decompositions (Hogg, Munoz-Avila, & Kuter, 2008;Zhuo, Hu, Hogg, Yang, & Munoz-Avila, 2009), and other methods for deriving hierarchical decompositions in planning include precondition relaxations (Sacerdoti, 1974) and causal graphs (Knoblock, 1994).…”

Section: Related Workmentioning

confidence: 99%

General Policies, Representations, and Planning Width

Bonet

Geffner

2021

AAAI

View full text Add to dashboard Cite

It has been observed that in many of the benchmark planning domains, atomic goals can be reached with a simple polynomial exploration procedure, called IW, that runs in time exponential in the problem width. Such problems have indeed a bounded width: a width that does not grow with the number of problem variables and is often no greater than two. Yet, while the notion of width has become part of the state- of-the-art planning algorithms like BFWS, there is still no good explanation for why so many benchmark domains have bounded width. In this work, we address this question by relating bounded width and serialized width to ideas of generalized planning, where general policies aim to solve multiple instances of a planning problem all at once. We show that bounded width is a property of planning domains that admit optimal general policies in terms of features that are explicitly or implicitly represented in the domain encoding. The results are extended to the larger class of domains with bounded serialized width where the general policies do not have to be optimal. The study leads also to a new simple, meaningful, and expressive language for specifying domain serializations in the form of policy sketches which can be used for encoding domain control knowledge by hand or for learning it from traces. The use of sketches and the meaning of the theoretical results are all illustrated through a number of examples.

show abstract

“…However, it still requires demonstrations to be provided as decomposition trees, divided by which higher-level goal is being pursued at any given point. (Hogg, Kuter, and Munoz-Avila 2010) integrates reinforcement learning into the HTN-MAKER framework, in order to ascertain values for the learned methods to decide which are more likely to be useful in any given situation. Using this setup, the authors manage to improve the rate at which the HTN model learns from demonstration, requiring less examples before achieving competency.…”

Section: Related Workmentioning

confidence: 99%

Unsupervised Learning of HTNs in Complex Adversarial Domains

Leece

2021

AIIDE

View full text Add to dashboard Cite

While Hierarchical Task Networks are frequently cited as flexible and powerful planning models, they are often ignored due to the intensive labor cost for experts/programmers, due to the need to create and refine the model by hand. While recent work has begun to address this issue by working towards learning aspects of an HTN model from demonstration, or even the whole framework, the focus so far has been on simple toy domains, which lack many of the challenges faced in the real world such as imperfect information and continuous environments. I plan to extend this work using the domain of real-time strategy (RTS) games, which have gained recent popularity as a challenging and complex domain for AI research.

show abstract

Learning Methods to Generate Good Plans: Integrating HTN Learning and Reinforcement Learning

Cited by 18 publications

References 8 publications

General Policies, Subgoal Structure, and Planning Width

General Policies, Subgoal Structure, and Planning Width

General Policies, Representations, and Planning Width

Unsupervised Learning of HTNs in Complex Adversarial Domains

Contact Info

Product

Resources

About