Robotics: Science and Systems XVI 2020
DOI: 10.15607/rss.2020.xvi.003
|View full text |Cite
|
Sign up to set email alerts
|

Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image

Abstract: In this paper, we propose a deep convolutional recurrent neural network that predicts action sequences for task and motion planning (TAMP) from an initial scene image. Typical TAMP problems are formalized by combining reasoning on a symbolic, discrete level (e.g. first-order logic) with continuous motion planning such as nonlinear trajectory optimization. Due to the great combinatorial complexity of possible discrete action sequences, a large number of optimization/motion planning problems have to be solved to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
39
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 63 publications
(40 citation statements)
references
References 32 publications
1
39
0
Order By: Relevance
“…However, their search is over plan refinements, rather than abstract actions and they do not address the state representation problem. Driess et al (2020a) proposed an approach that directly predicts a task plan from an initial image of the scene, based on which a motion-level planning was performed to find a motion plan that satisfies the predicted task plan. Our method differs in that (1) we assume we know the poses and shapes of objects, and (2) we provide guidance both at the task and motion levels based on a representation that can reason about occlusion, reachability, and collisions.…”
Section: Learning To Guide Planningmentioning
confidence: 99%
“…However, their search is over plan refinements, rather than abstract actions and they do not address the state representation problem. Driess et al (2020a) proposed an approach that directly predicts a task plan from an initial image of the scene, based on which a motion-level planning was performed to find a motion plan that satisfies the predicted task plan. Our method differs in that (1) we assume we know the poses and shapes of objects, and (2) we provide guidance both at the task and motion levels based on a representation that can reason about occlusion, reachability, and collisions.…”
Section: Learning To Guide Planningmentioning
confidence: 99%
“…There are several orthogonal avenues of research under this umbrella: methods which learn capabilities that may be difficult to engineer (e.g. a pouring action) [14], those which learn the symbolic representations with which to plan [15] [16], those that integrate perception learning and scene understanding into TAMP [17], [18], and those which attempt to learn search guidance from experience [19] [20] [21]. Similar in spirit to our work, [19] tries to guide the search for action skeletons by learning to predict the credibility of a sequence of discrete actions directly from visual observations of the scene, using these predictions as a heuristic in a best-first search for action sequences.…”
Section: Related Work Integrated Task and Motion Planningmentioning
confidence: 99%
“…In contrast, we seek to learn an efficient heuristic to guide the construction of an optimistic planning problem with a minimal set of irrelevant facts. We leverage an off-the-shelf domain-independent search sub-routine, thus our work can be synergistically combined with methods like [19] [20] [21] that learn a domain-specific heuristic.…”
Section: Related Work Integrated Task and Motion Planningmentioning
confidence: 99%
“…This is especially relevant for the alignment term φ align , which has multiple local minima. Instead of explicitly enumerating possible different alignments as in [41], [42], we diversify grasp candidate computation by introducing a random alignment term, effectively randomizing the approach axis of a grasp candidate. That is, at initialization we compute a random orthonomal basis…”
Section: A Grasp Configuration Observermentioning
confidence: 99%