2018 Annual American Control Conference (ACC) 2018 # Human-in-the-Loop Synthesis for Partially Observable Markov Decision Processes

**Abstract:** We study planning problems where autonomous agents operate inside environments that are subject to uncertainties and not fully observable. Partially observable Markov decision processes (POMDPs) are a natural formal model to capture such problems. Because of the potentially huge or even infinite belief space in POMDPs, synthesis with safety guarantees is, in general, computationally intractable. We propose an approach that aims to circumvent this difficulty: in scenarios that can be partially or fully simulate…

Help me understand this report

Search citation statements

Paper Sections

Select...

2

1

1

Citation Types

0

4

0

Year Published

2018

2023

Publication Types

Select...

4

3

1

Relationship

3

5

Authors

Journals

(4 citation statements)

(43 reference statements)

0

4

0

“…Junges et al (2018) construct an FSC using parameter synthesis for Markov chains, which is known to be ETR-complete (Junges, Katoen, et al, 2021), whereas NP ⊆ ETR ⊆ PSPACE. Carr et al (2018) render common POMDP scenarios as arcade games to capture human preferences that are formally cast into FSCs and subsequently verified. Ahmadi et al (2020) use control barrier functions to compute safe reachable sets in the belief space of POMDPs.…”

confidence: 99%

“…Junges et al (2018) construct an FSC using parameter synthesis for Markov chains, which is known to be ETR-complete (Junges, Katoen, et al, 2021), whereas NP ⊆ ETR ⊆ PSPACE. Carr et al (2018) render common POMDP scenarios as arcade games to capture human preferences that are formally cast into FSCs and subsequently verified. Ahmadi et al (2020) use control barrier functions to compute safe reachable sets in the belief space of POMDPs.…”

confidence: 99%

“…As they depend on probability distributions on partially observed states, optimal policies for mixed-observability MDPs and POMDPs are generally difficult to compute exactly [40], [41]. In this simulation, we used a randomized approximation of an optimal policy based on combining optimal actions for MDPs where beliefs are known, with weights corresponding to the probability distribution of the beliefs [42]. The light blue graph in Figure 5 describes average rewards (17), in analogy to the left side of Figure 3.…”

confidence: 99%

“…In contrast to our work, neither [20] nor [21] specializes on planning problems, and there is neither an implementation available nor any analysis how well these methods scale to systems of relevant size. Instead of automated abstraction, an interactive human-in-the-loop approach for strategy synthesis in POMDPs is described in [22], but such an approach, in contrast to the method described here, may not be fully automated. The strategies obtained by the method in this paper are finitememory strategies.…”

confidence: 99%