Watch this: Scalable cost-function learning for path planning in urban environments

Wulfmeier, Markus; Wang, Dominic Zeng; Posner, Ingmar

doi:10.1109/iros.2016.7759328

Cited by 112 publications

(131 citation statements)

References 24 publications

Supporting

Mentioning

131

Contrasting

Order By: Relevance

“…IOC for path prediction Kitani et al recover human preferences (i.e., reward function) to forecast plausible paths for a pedestrian in [23] using inverse optimal control (IOC), or inverse reinforcement learning (IRL) [1,52], while [26] adapt IOC and propose a dynamic reward function to address changes in environments for sequential path predictions. Combined with a deep neural network, deep IOC/IRL has been proposed to learn non-linear reward functions and showed promising results in robot control [11] and driving [50] tasks. However, one critical assumption made in IOC frameworks, which makes them hard to be applied to general path prediction tasks, is that the goal state or the destination of agent should be given a priori, whereby feasible paths must be found to the given destination from the planning or control point of view.…”

Section: Related Workmentioning

confidence: 99%

DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

Lee

Choi²,

Vernaza³

et al. 2017

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

924

872

View full text Add to dashboard Cite

We introduce a Deep Stochastic IOC 1 RNN Encoderdecoder framework, DESIRE, for the task of future predictions of multiple interacting agents in dynamic scenes. DESIRE effectively predicts future locations of objects in multiple scenes by 1) accounting for the multi-modal nature of the future prediction (i.e., given the same context, future may vary), 2) foreseeing the potential future outcomes and make a strategic prediction based on that, and 3) reasoning not only from the past motion history, but also from the scene context as well as the interactions among the agents. DESIRE achieves these in a single end-to-end trainable neural network model, while being computationally efficient. The model first obtains a diverse set of hypothetical future prediction samples employing a conditional variational autoencoder, which are ranked and refined by the following RNN scoring-regression module. Samples are scored by accounting for accumulated future rewards, which enables better long-term strategic decisions similar to IOC frameworks. An RNN scene context fusion module jointly captures past motion histories, the semantic scene context and interactions among multiple agents. A feedback mechanism iterates over the ranking and refinement to further boost the prediction accuracy. We evaluate our model on two publicly available datasets: KITTI and Stanford Drone Dataset. Our experiments show that the proposed model significantly improves the prediction accuracy compared to other baseline methods.

show abstract

Section: Related Workmentioning

confidence: 99%

DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

Lee

Choi²,

Vernaza³

et al. 2017

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

924

872

View full text Add to dashboard Cite

show abstract

“…Recently, Wulfmeier et al [29][30][31] proposed deep IRL, which combined MaxEnt-IRL with a deep neural network architecture to find nonlinear reward functions. However, their method suffers from the same three problems as MaxEnt-IRL.…”

Section: Related Workmentioning

confidence: 99%

“…However, we could not compare our method with deep MaxEnt-IRL [29][30][31] because deep MaxEnt-IRL has to find an optimal policy for every iteration and it took enormous time on the games of Atari 2600. Therefore, we selected PI_LOC [9] for comparison because D b can be used to evaluate the partition function.…”

Section: Atari Gamesmentioning

confidence: 99%

Model-Free Deep Inverse Reinforcement Learning by Logistic Regression

Uchibe

2017

Neural Process Lett

View full text Add to dashboard Cite

This paper proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. We formulate inverse reinforcement learning as a problem of density ratio estimation, and show that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under the framework of linearly solvable Markov decision processes. The logarithm of density ratio is efficiently calculated by binomial logistic regression, of which the classifier is constructed by the reward and state value function. The classifier tries to discriminate between samples drawn from the optimal state transition probability and those from the baseline one. Then, the estimated state value function is used to initialize the part of the deep neural networks for forward reinforcement learning. The proposed deep forward and inverse reinforcement learning is applied into two benchmark games: Atari 2600 and Reversi. Simulation results show that our method reaches the best performance substantially faster than the standard combination of forward and inverse reinforcement learning as well as behavior cloning.

show abstract

“…For path planning, the trajectory with the highest probability according to the learned model is chosen with the goal of a close imitation of pedestrian motion. Wulfmeier et al [12] present a similar approach using deep IRL instead of a combination of classical features in order to learn how to drive an autonomous car through static environments.…”

Section: A Learning By Demonstrationmentioning

confidence: 99%

Reinforced Imitation: Sample Efficient Deep Reinforcement Learning for Mapless Navigation by Leveraging Prior Demonstrations

Pfeiffer

Shukla

Turchetta

et al. 2018

IEEE Robot. Autom. Lett.

158

View full text Add to dashboard Cite

This work presents a case study of a learning-based approach for target driven map-less navigation. The underlying navigation model is an end-to-end neural network which is trained using a combination of expert demonstrations, imitation learning (IL) and reinforcement learning (RL). While RL and IL suffer from a large sample complexity and the distribution mismatch problem, respectively, we show that leveraging prior expert demonstrations for pre-training can reduce the training time to reach at least the same level of performance compared to plain RL by a factor of 5. We present a thorough evaluation of different combinations of expert demonstrations, different RL algorithms and reward functions, both in simulation and on a real robotic platform. Our results show that the final model outperforms both standalone approaches in the amount of successful navigation tasks. In addition, the RL reward function can be significantly simplified when using pre-training, e.g. by using a sparse reward only. The learned navigation policy is able to generalize to unseen and real-world environments.

show abstract

Watch this: Scalable cost-function learning for path planning in urban environments

Cited by 112 publications

References 24 publications

DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

Model-Free Deep Inverse Reinforcement Learning by Logistic Regression

Reinforced Imitation: Sample Efficient Deep Reinforcement Learning for Mapless Navigation by Leveraging Prior Demonstrations

Contact Info

Product

Resources

About