2020
DOI: 10.1007/978-3-030-53291-8_26
|View full text |Cite
|
Sign up to set email alerts
|

Optimistic Value Iteration

Abstract: Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two "sound" variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration's ability to usually deliver good lower bounds: we obtain a lower b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
50
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 42 publications
(54 citation statements)
references
References 30 publications
0
50
0
Order By: Relevance
“…One way to combat these problems is to approach the solution from both directions, a technique referred to as interval iteration [15,23,58]. Storm implements the latter and additionally the more recent sound value iteration [110] and optimistic value iteration [71]. Numerical errors aside 4 , these methods ensure a correct result within a user-defined accuracy and come with a small time penalty as shown in Sect.…”
Section: Exact and Sound Model Checkingmentioning
confidence: 99%
“…One way to combat these problems is to approach the solution from both directions, a technique referred to as interval iteration [15,23,58]. Storm implements the latter and additionally the more recent sound value iteration [110] and optimistic value iteration [71]. Numerical errors aside 4 , these methods ensure a correct result within a user-defined accuracy and come with a small time penalty as shown in Sect.…”
Section: Exact and Sound Model Checkingmentioning
confidence: 99%
“…The maximal total reward in M can be computed using standard techniques such as value iteration and policy iteration [46] as well as the more recent sound value iteration and optimistic value iteration [48,36]. The latter two provide sound precision guarantees for the output value…”
Section: Pure Long-run Average Queriesmentioning
confidence: 99%
“…The supremum sup {Ex σ (lra(R w )) | σ ∈ Σ } is attained by some memoryless deterministic strategy σ w ∈ Σ md [30]. Such a strategy and the induced value v w = Ex σw (lra(R w )) can be computed (or approximated) with linear programming [30], strategy iteration [42] or value iteration [17,1] The maximal total reward in M can be computed using standard techniques such as value iteration and policy iteration [46] as well as the more recent sound value iteration and optimistic value iteration [48,36]. The latter two provide sound precision guarantees for the output value v, i.e., |v − max{Ex M ,s I σ (tot(R * )) | σ ∈ Σ M }| ≤ ε for a given ε > 0.…”
Section: Pure Long-run Average Queriesmentioning
confidence: 99%