“…The bi-objective criteria of solution cost and search effort is best reasoned about in the paradigm of anytime planning . In this paradigm, an algorithm traces out the Pareto frontier (Choudhury et al, 2016): finds a feasible solution quickly and iteratively improves it. In this paradigm, SaIL trains a heuristic that displays a behavior we would expect in the first iteration.…”
Robot planning is the process of selecting a sequence of actions that optimize for a task specific objective. For instance, the objective for a navigation task would be to find collision free paths, while the objective for an exploration task would be to map unknown areas. The optimal solutions to such tasks are heavily influenced by the implicit structure in the environment, i.e. the configuration of objects in the world. State-of-the-art planning approaches, however, do not exploit this structure, thereby expending valuable effort searching the action space instead of focusing on potentially good actions. In this paper, we address the problem of enabling planners to adapt their search strategies by inferring such good actions in an efficient manner using only the information uncovered by the search up until that time.We formulate this as a problem of sequential decision making under uncertainty where at a given iteration a planning policy must map the state of the search to a planning action. Unfortunately, the training process for such partial information based policies is slow to converge and susceptible to poor local minima. Our key insight is that if we could fully observe the underlying world map, we would easily be able to disambiguate between good and bad actions. We hence present a novel datadriven imitation learning framework to efficiently train planning policies by imitating a clairvoyant oracle -an oracle that at train time has full knowledge about the world map and can compute optimal decisions. We leverage the fact that for planning problems, such oracles can be efficiently computed and derive performance guarantees for the learnt policy. We examine two important domains that rely on partial information based policies -informative path planning and search based motion planning. We validate the approach on a spectrum of environments for both problem domains, including experiments on a real UAV, and show that the learnt policy consistently outperforms stateof-the-art algorithms. Our framework is able to train policies that achieve upto 39% more reward than state-of-the art information gathering heuristics and a 70x speedup as compared to A* on search based planning problems. Our approach paves the way forward for applying data-driven techniques to other such problem domains under the umbrella of robot planning.
“…The bi-objective criteria of solution cost and search effort is best reasoned about in the paradigm of anytime planning . In this paradigm, an algorithm traces out the Pareto frontier (Choudhury et al, 2016): finds a feasible solution quickly and iteratively improves it. In this paradigm, SaIL trains a heuristic that displays a behavior we would expect in the first iteration.…”
Robot planning is the process of selecting a sequence of actions that optimize for a task specific objective. For instance, the objective for a navigation task would be to find collision free paths, while the objective for an exploration task would be to map unknown areas. The optimal solutions to such tasks are heavily influenced by the implicit structure in the environment, i.e. the configuration of objects in the world. State-of-the-art planning approaches, however, do not exploit this structure, thereby expending valuable effort searching the action space instead of focusing on potentially good actions. In this paper, we address the problem of enabling planners to adapt their search strategies by inferring such good actions in an efficient manner using only the information uncovered by the search up until that time.We formulate this as a problem of sequential decision making under uncertainty where at a given iteration a planning policy must map the state of the search to a planning action. Unfortunately, the training process for such partial information based policies is slow to converge and susceptible to poor local minima. Our key insight is that if we could fully observe the underlying world map, we would easily be able to disambiguate between good and bad actions. We hence present a novel datadriven imitation learning framework to efficiently train planning policies by imitating a clairvoyant oracle -an oracle that at train time has full knowledge about the world map and can compute optimal decisions. We leverage the fact that for planning problems, such oracles can be efficiently computed and derive performance guarantees for the learnt policy. We examine two important domains that rely on partial information based policies -informative path planning and search based motion planning. We validate the approach on a spectrum of environments for both problem domains, including experiments on a real UAV, and show that the learnt policy consistently outperforms stateof-the-art algorithms. Our framework is able to train policies that achieve upto 39% more reward than state-of-the art information gathering heuristics and a 70x speedup as compared to A* on search based planning problems. Our approach paves the way forward for applying data-driven techniques to other such problem domains under the umbrella of robot planning.
“…An alternate approach to modeling belief over configuration spaces is to assume edges are locally correlated. Under this assumption, one can use local models such as KDE [8], mixture of Gaussians [20], RKHS [28] or even customized models [27]. The efficacy of these models depends on how accurately they can represent the world, how efficiently they can be updated and how efficiently they can be projected on the graph.…”
Robots operate in environments with varying implicit structure. For instance, a helicopter flying over terrain encounters a very different arrangement of obstacles than a robotic arm manipulating objects on a cluttered table top. State-of-the-art motion planning systems do not exploit this structure, thereby expending valuable planning effort searching for implausible solutions. We are interested in planning algorithms that actively infer the underlying structure of the valid configuration space during planning in order to find solutions with minimal effort. Consider the problem of evaluating edges on a graph to quickly discover collision-free paths. Evaluating edges is expensive, both for robots with complex geometries like robot arms, and for robots with limited onboard computation like UAVs. Until now, this challenge has been addressed via laziness i.e. deferring edge evaluation until absolutely necessary, with the hope that edges turn out to be valid. However, all edges are not alike in valuesome have a lot of potentially good paths flowing through them, and some others encode the likelihood of neighbouring edges being valid. This leads to our key insight -instead of passive laziness, we can actively choose edges that reduce the uncertainty about the validity of paths. We show that this is equivalent to the Bayesian active learning paradigm of decision region determination (DRD). However, the DRD problem is not only combinatorially hard, but also requires explicit enumeration of all possible worlds. We propose a novel framework that combines two DRD algorithms, DIRECT and BISECT, to overcome both issues. We show that our approach outperforms several state-of-the-art algorithms on a spectrum of planning problems for mobile robots, manipulators and autonomous helicopters.
“…There are approaches that divide the configuration space into regions and either select different region-specific planning strategies [21] or use entropy of samples in a particular region to refine sampling [22]. Other methods try to model the free space to speed up planning [23][24][25]. While these techniques are quite successful in a large set of problems, they can place samples in regions where an optimal path is unlikely to traverse.…”
We consider the problem of leveraging prior experience to generate roadmaps in sampling-based motion planning. A desirable roadmap is one that is sparse, allowing for fast search, with nodes spread out at key locations such that a lowcost feasible path exists. An increasingly popular approach is to learn a distribution of nodes that would produce such a roadmap. State-of-the-art is to train a conditional variational auto-encoder (CVAE) on the prior dataset with the shortest paths as target input. While this is quite effective on many problems, we show it can fail in the face of complex obstacle configurations or mismatch between training and testing.We present an algorithm LEGO that addresses these issues by training the CVAE with target samples that satisfy two important criteria. Firstly, these samples belong only to bottleneck regions along near-optimal paths that are otherwise difficultto-sample with a uniform sampler. Secondly, these samples are spread out across diverse regions to maximize the likelihood of a feasible path existing. We formally define these properties and prove performance guarantees for LEGO. We extensively evaluate LEGO on a range of planning problems, including robot arm planning, and report significant gains over heuristics as well as learned baselines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.