Abstract-Existing high-dimensional motion planning algorithms are simultaneously overpowered and underpowered. In domains sparsely populated by obstacles, the heuristics used by sampling-based planners to navigate "narrow passages" can be needlessly complex; furthermore, additional post-processing is required to remove the jerky or extraneous motions from the paths that such planners generate. In this paper, we present CHOMP, a novel method for continuous path refinement that uses covariant gradient techniques to improve the quality of sampled trajectories. Our optimization technique converges over a wider range of input paths and is able to optimize higherorder dynamics of trajectories than previous path optimization strategies. As a result, CHOMP can be used as a standalone motion planner in many real-world planning queries. The effectiveness of our proposed method is demonstrated in manipulation planning for a 6-DOF robotic arm as well as in trajectory generation for a walking quadruped robot.
In this paper, we present CHOMP (Covariant Hamiltonian Optimization for Motion Planning), a method for trajectory optimization invariant to reparametrization. CHOMP uses functional gradient techniques to iteratively improve the quality of an initial trajectory, optimizing a functional that trades off between a smoothness and an obstacle avoidance component. CHOMP can be used to locally optimize feasible trajectories, as well as to solve motion planning queries, converging to lowcost trajectories even when initialized with infeasible ones. It uses Hamiltonian Monte Carlo to alleviate the problem of convergence to high-cost local minima (and for probabilistic completeness), and is capable of respecting hard constraints along the trajectory. We present extensive experiments with CHOMP on manipulation and locomotion tasks, using 7-DOF manipulators and a rough-terrain quadruped robot.
Imitation learning of sequential, goaldirected behavior by standard supervised techniques is often difficult. We frame learning such behaviors as a maximum margin structured prediction problem over a space of policies. In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior. Further, we demonstrate a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference. Although the technique is general, it is particularly relevant in problems where A* and dynamic programming approaches make learning policies tractable in problems beyond the limitations of a QP formulation. We demonstrate our approach applied to route planning for outdoor mobile robots, where the behavior a designer wishes a planner to execute is often clear, while specifying cost functions that engender this behavior is a much more difficult task.
Abstract-We present a novel approach for determining robot movements that efficiently accomplish the robot's tasks while not hindering the movements of people within the environment. Our approach models the goal-directed trajectories of pedestrians using maximum entropy inverse optimal control. The advantage of this modeling approach is the generality of its learned cost function to changes in the environment and to entirely different environments. We employ the predictions of this model of pedestrian trajectories in a novel incremental planner and quantitatively show the improvement in hindrancesensitive robot trajectory planning provided by our approach. I. INTRODUCTIONDetermining appropriate robotic actions in environments with moving people is a well-studied [15], [2], [5], but often difficult task due to the uncertainty of each person's future behavior. Robots should certainly never collide with people [11], but avoiding collisions alone is often unsatisfactory because the disruption of almost colliding can be burdensome to people and sub-optimal for robots. Instead, robots should predict the future locations of people and plan routes that will avoid such hindrances (i.e., situations where the person's natural behavior is disrupted due to a robot's proximity) while still efficiently achieving the robot's objectives. For example, given the origins and target destinations of the robot and person in Figure 1, the robot's hindrance-minimizing trajectory would take the longer way around the center obstacle (a table), leaving a clear path for the pedestrian.One common approach for predicting trajectories is to project the prediction step of a tracking filter [9], [13], [10] forward over time. For example, a Kalman filter's [7] future positions are predicted according to a Gaussian distribution with growing uncertainty and, unfortunately, often high probability for physically impossible locations (e.g., behind walls, within obstacles). Particle filters [16] can incorporate more sophisticated constraints and non-Gaussian distributions, but degrade into random walks of feasible motion over large time horizons rather than purposeful, goal-based motion. Closer to our research are approaches that directly model the policy [6]. These approaches assume that previously observed trajectories capture all purposeful behavior, and the only uncertainty involves determining to which previously observed class of trajectories the current behavior belongs. Models based on mixtures of trajectories and conditioned action distribution modeling (using hidden Markov models) have been employed [17]. This approach often suffers from over-fitting to the particular training trajectories and context of those trajectories. When changes to the environment occur (e.g., rearrangement of the furniture), the model will confidently predict incorrect trajectories through obstacles.
Fig. 1. Policies for opening a cabinet drawer and swing-peg-in-hole tasks trained by alternatively performing reinforcement learning with multiple agents in simulation and updating simulation parameter distribution using a few real world policy executions.Abstract-We consider the problem of transferring policies to the real world by training on a distribution of simulated scenarios. Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training. In doing so, we are able to change the distribution of simulations to improve the policy transfer by matching the policy behavior in simulation and the real world. We show that policies trained with our method are able to reliably transfer to different robots in two real world tasks: swing-peg-in-hole and opening a cabinet drawer. The video of our experiments can be found at https: //sites.google.com/view/simopt.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.