2019
DOI: 10.1109/lra.2019.2903259
|View full text |Cite
|
Sign up to set email alerts
|

Bi-Directional Value Learning for Risk-Aware Planning Under Uncertainty

Abstract: Decision-making under uncertainty is a crucial ability for autonomous systems. In its most general form, this problem can be formulated as a Partially Observable Markov Decision Process (POMDP). The solution policy of a POMDP can be implicitly encoded as a value function. In partially observable settings, the value function is typically learned via forward simulation of the system evolution. Focusing on accurate and long-range risk assessment, we propose a novel method, where the value function is learned in d… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
23
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 31 publications
(24 citation statements)
references
References 21 publications
0
23
0
Order By: Relevance
“…, y t ). The belief state b t can be recursively updated with the following transition function τ (Kim et al, 2019)…”
Section: Preliminariesmentioning
confidence: 99%
See 2 more Smart Citations
“…, y t ). The belief state b t can be recursively updated with the following transition function τ (Kim et al, 2019)…”
Section: Preliminariesmentioning
confidence: 99%
“…Various types of rewards modification in POMDPs have been investigated in previous research efforts ( Lee et al, 2018 ; Kim et al, 2019 ). Typically, the reward function in POMDPs is designed to solve the stochastic shortest path problem, where the goal is to compute a feedback plan that reaches a target state from a known initial state by maximizing the expected total reward.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…For replanning purposes, forward search algorithms can be used in an RHC scheme. Recently, methods using RHC have been extended for the belief space as well as dynamic environments (Agha‐mohammadi, Agarwal, Kim, Chakravorty, & Amato, ; Chakravorty & Erwin, ; Erez & Smart, ; He, Brunskill, & Roy, ; Kim, Thakker, & Agha‐mohammadi, ; Platt, Tedrake, Kaelbling, & Lozano‐Perez, ; Toit & Burdick, ). In an RHC scheme, optimization is performed only within a limited horizon; thus, the system performs optimization within the specified horizon, then the system takes the next immediate action and moves the optimization horizon one step forward before repeating the process.…”
Section: Related Workmentioning
confidence: 99%
“…For example, the works in [8], [9] improved the performance of VO localization by actively choosing timing and camera direction to obtain an optimal image sequence using the predictive perception technique. This problem is typically approached by Partially Observable Markov Decision Process (POMDP), or belief-space planning [10], [11], [12], [13], [14], where the planner chooses optimal actions under motion and sensing uncertainty.…”
Section: Introductionmentioning
confidence: 99%