Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes

́zdil, Tomas Br; Brožek, Václav; Chatterjee, Krishnendu; Forejt, Vojtěch; Kučera, Antonín

doi:10.1109/lics.2011.10

Cited by 43 publications

(115 citation statements)

References 16 publications

Supporting

Mentioning

114

Contrasting

Order By: Relevance

“…Similar concepts for a strategy σ of the Spoiler are defined analogously. In this paper we use an alternative formulation of strategy [1] that generalises the concept of strategy automata [6].…”

Section: Definition 1 (Stochastic Game Arena)mentioning

confidence: 99%

“…Strategies are expressed as strategy automata [6,1] that consist of-i) a set of memory elements, ii) a memory update function that specifies how memory is updated as the transitions occur in the game arena, and iii) a next move function that specifies a distribution over the successors of game state, depending on the memory element. Memory update functions in strategy automata can be either deterministic or stochastic [1].…”

Section: Introductionmentioning

confidence: 99%

“…Memory update functions in strategy automata can be either deterministic or stochastic [1]. We show that the choice of how the memory is updated drastically influences the size of memory required.…”

Section: Introductionmentioning

confidence: 99%

“…The precise value problem studied here is a special case of multiobjective optimisation, where a player strives to fulfill several (in our case two) objectives at once, each with a certain minimum probability. Multi-objective optimisation has been studied for Markov decision processes with discounted rewards [4], long-run average rewards [1], as well as reachability and ω-regular objectives [7]; however, none of these works consider multi-player optimisation.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Playing Stochastic Games Precisely

Chen

Forejt

Kwiatkowska

et al. 2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We study stochastic two-player games where the goal of one player is to achieve precisely a given expected value of the objective function, while the goal of the opponent is the opposite. Potential applications for such games include controller synthesis problems where the optimisation objective is to maximise or minimise a given payoff function while respecting a strict upper or lower bound, respectively. We consider a number of objective functions including reachability, ω-regular, discounted reward, and total reward. We show that precise value games are not determined, and compare the memory requirements for winning strategies. For stopping games we establish necessary and sufficient conditions for the existence of a winning strategy of the controller for a large class of functions, as well as provide the constructions of compact strategies for the studied objectives.

show abstract

Section: Definition 1 (Stochastic Game Arena)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Playing Stochastic Games Precisely

Chen

Forejt

Kwiatkowska

et al. 2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…For non-stochastic games, multi-dimensional objectives have been considered in [6,23]. For MDPs, multiple discounted objectives [5], long-run objectives [2], ω-regular objectives [9] and total rewards [12] have been analysed. The objectives that we study in this paper are a special case of branching time temporal logics for stochastic games [3,1].…”

Section: Introductionmentioning

confidence: 99%

On Stochastic Games with Multiple Objectives

Chen

Forejt

Kwiatkowska

et al. 2013

Mathematical Foundations of Computer Science 2013

129

View full text Add to dashboard Cite

Abstract. We study two-player stochastic games, where the goal of one player is to satisfy a formula given as a positive boolean combination of expected total reward objectives and the behaviour of the second player is adversarial. Such games are important for modelling, synthesis and verification of open systems with stochastic behaviour. We show that finding a winning strategy is PSPACE-hard in general and undecidable for deterministic strategies. We also prove that optimal strategies, if they exists, may require infinite memory and randomisation. However, when restricted to disjunctions of objectives only, memoryless deterministic strategies suffice, and the problem of deciding whether a winning strategy exists is NP-complete. We also present algorithms to approximate the Pareto sets of achievable objectives for the class of stopping games.

show abstract

Run-Time Optimization for Learned Controllers Through Quantitative Games

Avni

Bloem

Chatterjee

et al. 2019

Computer Aided Verification

Self Cite

View full text Add to dashboard Cite

A controller is a device that interacts with a plant. At each time point, it reads the plant's state and issues commands with the goal that the plant operates optimally. Constructing optimal controllers is a fundamental and challenging problem. Machine learning techniques have recently been successfully applied to train controllers, yet they have limitations. Learned controllers are monolithic and hard to reason about. In particular, it is difficult to add features without retraining, to guarantee any level of performance, and to achieve acceptable performance when encountering untrained scenarios. These limitations can be addressed by deploying quantitative run-time shields that serve as a proxy for the controller. At each time point, the shield reads the command issued by the controller and may choose to alter it before passing it on to the plant. We show how optimal shields that interfere as little as possible while guaranteeing a desired level of controller performance, can be generated systematically and automatically using reactive synthesis. First, we abstract the plant by building a stochastic model. Second, we consider the learned controller to be a black box. Third, we measure controller performance and shield interference by two quantitative run-time measures that are formally defined using weighted automata. Then, the problem of constructing a shield that guarantees maximal performance with minimal interference is the problem of finding an optimal strategy in a stochastic 2-player game "controller versus shield" played on the abstract state space of the plant with a quantitative objective obtained from combining the performance and interference measures. We illustrate the effectiveness of our approach by automatically constructing lightweight shields for learned traffic-light controllers in various road networks. The shields we generate avoid liveness bugs, improve controller performance in untrained and changing traffic situations, and add features to learned controllers, such as giving priority to emergency vehicles.

show abstract

Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes

Cited by 43 publications

References 16 publications

Playing Stochastic Games Precisely

Playing Stochastic Games Precisely

On Stochastic Games with Multiple Objectives

Run-Time Optimization for Learned Controllers Through Quantitative Games

Contact Info

Product

Resources

About