2019
DOI: 10.1007/978-3-030-25540-4_36
|View full text |Cite
|
Sign up to set email alerts
|

Run-Time Optimization for Learned Controllers Through Quantitative Games

Abstract: A controller is a device that interacts with a plant. At each time point, it reads the plant's state and issues commands with the goal that the plant operates optimally. Constructing optimal controllers is a fundamental and challenging problem. Machine learning techniques have recently been successfully applied to train controllers, yet they have limitations. Learned controllers are monolithic and hard to reason about. In particular, it is difficult to add features without retraining, to guarantee any level of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
30
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 35 publications
(30 citation statements)
references
References 35 publications
0
30
0
Order By: Relevance
“…Our monitoring problem can be phrased as a special case of verification of partially observable stochastic games [20], but automatic techniques for those very general models are lacking. Likewise, the idea of shielding (pre)computes all action choices that lead to safe behavior [3,5,15,24,34,35]. For partially observable settings, shielding again requires to compute partial-information schedulers [21,39], contrary to our approach.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Our monitoring problem can be phrased as a special case of verification of partially observable stochastic games [20], but automatic techniques for those very general models are lacking. Likewise, the idea of shielding (pre)computes all action choices that lead to safe behavior [3,5,15,24,34,35]. For partially observable settings, shielding again requires to compute partial-information schedulers [21,39], contrary to our approach.…”
Section: Related Workmentioning
confidence: 99%
“…We omit the (single) sensor state for conciseness 5. To avoid growth, one may use fixed-precision numbers that over-approximate the probability of being in any state-inducing a growing (but conservative) error.…”
mentioning
confidence: 99%
“…For future work, we plan to investigate the application of online shielding in other settings, such as decision making in robotics and control. Another interesting extension would be to incorporate quantitative performance measures in the form of rewards and costs into the computation of the online shield, as previously demonstrated in an offline manner [4] and in a hybrid approach [29], where runtime information was used to learn the environment dynamics.…”
Section: Discussionmentioning
confidence: 99%
“…Shields are usually constructed offline by computing a maximally permissive policy containing all actions that will not violate the safety specification. Several extensions exist [4,6,29,39]. The shielding approach has been shown to be successful in combination with RL [2,21].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation