2020 IEEE International Conference on Robotics and Automation (ICRA) 2020
DOI: 10.1109/icra40945.2020.9197199
|View full text |Cite
|
Sign up to set email alerts
|

Learning Navigation Costs from Demonstration in Partially Observable Environments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 9 publications
(14 citation statements)
references
References 10 publications
0
14
0
Order By: Relevance
“…Reinforcement Learning (RL) can allow autonomous systems to learn policies for complex tasks, such as control [12] and obstacle avoidance [13], reducing the engineering effort to the design of a suitable reward function [14]. In combination with deep neural networks, deep RL provides a powerful tool capable of mapping high-dimensional sensory inputs to optimal actions, showing promising results in fields, such as autonomous driving [15] and robot navigation [16], [17].…”
Section: Related Workmentioning
confidence: 99%
“…Reinforcement Learning (RL) can allow autonomous systems to learn policies for complex tasks, such as control [12] and obstacle avoidance [13], reducing the engineering effort to the design of a suitable reward function [14]. In combination with deep neural networks, deep RL provides a powerful tool capable of mapping high-dimensional sensory inputs to optimal actions, showing promising results in fields, such as autonomous driving [15] and robot navigation [16], [17].…”
Section: Related Workmentioning
confidence: 99%
“…For example, α → ∞ means that the expert takes strictly optimal controls while α = 0 means random controls are selected. The Boltzmann expert model was previously introduced and studied in [30], [33], [34].…”
Section: B Demonstrator Modelmentioning
confidence: 99%
“…We discuss how to differentiate the loss function L c (θ) in ( 6) with respect to θ through the deterministic shortest path problem defined by the product WFA-MDP model. Wang et al [30] introduce a sub-gradient descent approach to differentiate the log likelihood of the expert demonstrations evaluated by the Bolzman policy in (5) through the optimal cost-to-go values in (7). The cost parameters can be updated by stochastic subgradient descent at each iteration k with learning rate γ (k) :…”
Section: Optimizing Cost Parametersmentioning
confidence: 99%
See 1 more Smart Citation
“…This paper is a revised and extended version of our previous conference publications (Wang et al 2020a,b). In our previous work (Wang et al 2020a), we proposed differentiable mapping and planning stages to learn the expert cost function. The cost function is parameterized as a neural network over binary occupancy probabilities, updated from local distance observations.…”
Section: Mapping and Planningmentioning
confidence: 99%