2014
DOI: 10.2168/lmcs-10(1:13)2014
|View full text |Cite
|
Sign up to set email alerts
|

Markov Decision Processes with Multiple Long-run Average Objectives

Abstract: Abstract. We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with k k k limit-average functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limit-average value stays above a given vector. We show that under the expectation objec… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
53
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 33 publications
(54 citation statements)
references
References 27 publications
(76 reference statements)
1
53
0
Order By: Relevance
“…The need for infinite memory was proved in [6,Section 5] for the problem of ensuring thresholds Fig. 4, v 1 = v 2 = 0.5 and α = 1 can be ensured by an infinite-memory strategy and that finite-memory strategies can only achieve these thresholds with probability 0.…”
Section: Percentiles On Multi-dimensional Mpmentioning
confidence: 99%
See 2 more Smart Citations
“…The need for infinite memory was proved in [6,Section 5] for the problem of ensuring thresholds Fig. 4, v 1 = v 2 = 0.5 and α = 1 can be ensured by an infinite-memory strategy and that finite-memory strategies can only achieve these thresholds with probability 0.…”
Section: Percentiles On Multi-dimensional Mpmentioning
confidence: 99%
“…The linear program follows the ideas of [18,6]. Note that the first two lines of (L) corresponds to the multiple reachability LP of [18] for absorbing target states.…”
Section: Percentiles On Multi-dimensional Mpmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, although at multiple places we build on the techniques of [13] and [2] which allow us to deal with maximal end components (sometimes called strongly communicating sets) of an MDP separately, we often need to extend these techniques. Unlike the works [13] and [2] which study multiple "independent" objectives, in the case of the global variance any change of value in the expected mean payoff implies a change of value of the variance.…”
Section: (Zero Variance)mentioning
confidence: 99%
“…Then the user gets 2 Mbits/sec connection almost surely, but since the individual runs are apparently "unstable", he may still see a lot of stuttering in the video stream. As an appropriate measure for the stability of individual runs, we propose local variance, which is defined as the long-run average of (r i (ω) − mp(ω)) 2 , where r i (ω) is the reward of the i-th action executed in a run ω and mp(ω) is the mean payoff of ω. Hence, local variance says how much the rewards of the actions executed along a given run deviate from the mean payoff of the run on average.…”
Section: Introductionmentioning
confidence: 99%