2018 Annual American Control Conference (ACC) 2018
DOI: 10.23919/acc.2018.8431911
|View full text |Cite
|
Sign up to set email alerts
|

Human-in-the-Loop Synthesis for Partially Observable Markov Decision Processes

Abstract: We study planning problems where autonomous agents operate inside environments that are subject to uncertainties and not fully observable. Partially observable Markov decision processes (POMDPs) are a natural formal model to capture such problems. Because of the potentially huge or even infinite belief space in POMDPs, synthesis with safety guarantees is, in general, computationally intractable. We propose an approach that aims to circumvent this difficulty: in scenarios that can be partially or fully simulate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 37 publications
(43 reference statements)
0
4
0
Order By: Relevance
“…Junges et al (2018) construct an FSC using parameter synthesis for Markov chains, which is known to be ETR-complete (Junges, Katoen, et al, 2021), whereas NP ⊆ ETR ⊆ PSPACE. Carr et al (2018) render common POMDP scenarios as arcade games to capture human preferences that are formally cast into FSCs and subsequently verified. Ahmadi et al (2020) use control barrier functions to compute safe reachable sets in the belief space of POMDPs.…”
Section: Related Workmentioning
confidence: 99%
“…Junges et al (2018) construct an FSC using parameter synthesis for Markov chains, which is known to be ETR-complete (Junges, Katoen, et al, 2021), whereas NP ⊆ ETR ⊆ PSPACE. Carr et al (2018) render common POMDP scenarios as arcade games to capture human preferences that are formally cast into FSCs and subsequently verified. Ahmadi et al (2020) use control barrier functions to compute safe reachable sets in the belief space of POMDPs.…”
Section: Related Workmentioning
confidence: 99%
“…As they depend on probability distributions on partially observed states, optimal policies for mixed-observability MDPs and POMDPs are generally difficult to compute exactly [40], [41]. In this simulation, we used a randomized approximation of an optimal policy based on combining optimal actions for MDPs where beliefs are known, with weights corresponding to the probability distribution of the beliefs [42]. The light blue graph in Figure 5 describes average rewards (17), in analogy to the left side of Figure 3.…”
Section: Optimal Deception With Imperfect Knowledgementioning
confidence: 99%
“…In contrast to our work, neither [20] nor [21] specializes on planning problems, and there is neither an implementation available nor any analysis how well these methods scale to systems of relevant size. Instead of automated abstraction, an interactive human-in-the-loop approach for strategy synthesis in POMDPs is described in [22], but such an approach, in contrast to the method described here, may not be fully automated. The strategies obtained by the method in this paper are finitememory strategies.…”
Section: Introductionmentioning
confidence: 99%