2016
DOI: 10.1007/978-3-662-49674-9_8
|View full text |Cite
|
Sign up to set email alerts
|

Safety-Constrained Reinforcement Learning for MDPs

Abstract: Abstract. We consider controller synthesis for stochastic and partially unknown environments in which safety is essential. Specifically, we abstract the problem as a Markov decision process in which the expected performance is measured using a cost function that is unknown prior to run-time exploration of the state space. Standard learning approaches synthesize cost-optimal strategies without guaranteeing safety properties. To remedy this, we first compute safe, permissive strategies. Then, exploration is cons… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
53
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
8
2

Relationship

3
7

Authors

Journals

citations
Cited by 74 publications
(53 citation statements)
references
References 22 publications
0
53
0
Order By: Relevance
“…Besides suitability, we consider safety of system behavior. Unaltered RL algorithms use trial-and-error style exploration to optimize their behavior yet this may not suit a particular domain [78,92,136,153]. For example, tailoring the insulin delivery policy of an artificial pancreas to the metabolism of an individual requires trial insulin delivery action but these should only be sampled when their outcome is within safe certainty bounds [44].…”
Section: A Classification Of Personalization Settingsmentioning
confidence: 99%
“…Besides suitability, we consider safety of system behavior. Unaltered RL algorithms use trial-and-error style exploration to optimize their behavior yet this may not suit a particular domain [78,92,136,153]. For example, tailoring the insulin delivery policy of an artificial pancreas to the metabolism of an individual requires trial insulin delivery action but these should only be sampled when their outcome is within safe certainty bounds [44].…”
Section: A Classification Of Personalization Settingsmentioning
confidence: 99%
“…multi-objective mean-payoff objectives [8], objectives over instantaneous costs [10], and parity objectives [7]. Multi-objective problems for MDPs with an unknown cost-function are considered in [33]. Surveys on multi-objective decision making in AI and machine learning can be found in [44] and [47], respectively.…”
Section: Introductionmentioning
confidence: 99%
“…A trajectory-based algorithm which combines policy gradient and actor-critic methods was presented to solve a CVaR-constrained problem (Chow et al 2017). For robust MDP problems, with considering a set of general uncertainties (random action, unknown cost and safety hazards), an approach was provided to compute safe and optimal strategies iteratively (Junges et al 2016). Q-learning has also been used to provide risksensitive analysis on the fMRI signals, which provides a better interpretation of the human behavior in a sequential decision task (Shen et al 2014).…”
Section: Related Workmentioning
confidence: 99%