Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control

Devin, Coline; Geng, Daniel; Abbeel, Pieter; Darrell, Trevor; Levine, Sergey

doi:10.48550/arxiv.1910.14033

Cited by 1 publication

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ordered-goal tasks have also been recently proposed in RL for training agents with symbolic and temporal rules [13][14][15][16][17][18][19][20]. Policy-concatenation (stitching) has also been proposed as a means of skill-learning in the Options framework [21][22][23], as well as Hierarchical Control [24,12], and other schemes for concatenating polices such as policy sketches (using modular polices and task-specific reward functions) [25], or compositional plan vectors [26]. Lastly, a compositional Boolean task algebra was recently introduced demonstrating zero-shot transfer for non-sequential polices [27].…”

Section: Introductionmentioning

confidence: 99%

Jump Operator Planning: Goal-Conditioned Policy Ensembles and Zero-Shot Transfer

Ringstrom,

Hasanbeig,

Abate

2020

Preprint

View full text Add to dashboard Cite

In Hierarchical Control, compositionality, abstraction, and task-transfer are crucial for designing versatile algorithms which can solve a variety of problems with maximal representational reuse. We propose a novel hierarchical and compositional framework called Jump-Operator Dynamic Programming for quickly computing solutions within a super-exponential space of sequential sub-goal tasks with ordering constraints, while also providing a fast linearly-solvable algorithm as an implementation. This approach involves controlling over an ensemble of reusable goal-conditioned polices functioning as temporally extended actions, and utilizes transition operators called feasibility functions, which are used to summarize initial-to-final state dynamics of the polices. Consequently, the added complexity of grounding a high-level task space onto a larger ambient state-space can be mitigated by optimizing in a lower-dimensional subspace defined by the grounding, substantially improving the scalability of the algorithm while effecting transferable solutions. We then identify classes of objective functions on this subspace whose solutions are invariant to the grounding, resulting in optimal zero-shot transfer.

show abstract