Run-Time Optimization for Learned Controllers Through Quantitative Games

Avni, Guy; Bloem, Roderick; Chatterjee, Krishnendu; Henzinger, Thomas A.; Könighofer, Bettina; Pranger, Stefan

doi:10.1007/978-3-030-25540-4_36

Cited by 35 publications

(30 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our monitoring problem can be phrased as a special case of verification of partially observable stochastic games [20], but automatic techniques for those very general models are lacking. Likewise, the idea of shielding (pre)computes all action choices that lead to safe behavior [3,5,15,24,34,35]. For partially observable settings, shielding again requires to compute partial-information schedulers [21,39], contrary to our approach.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Runtime Monitors for Markov Decision Processes

Junges

Torfah

Seshia

2021

Computer Aided Verification

View full text Add to dashboard Cite

We investigate the problem of monitoring partially observable systems with nondeterministic and probabilistic dynamics. In such systems, every state may be associated with a risk, e.g., the probability of an imminent crash. During runtime, we obtain partial information about the system state in form of observations. The monitor uses this information to estimate the risk of the (unobservable) current system state. Our results are threefold. First, we show that extensions of state estimation approaches do not scale due the combination of nondeterminism and probabilities. While exploiting a geometric interpretation of the state estimates improves the practical runtime, this cannot prevent an exponential memory blowup. Second, we present a tractable algorithm based on model checking conditional reachability probabilities. Third, we provide prototypical implementations and manifest the applicability of our algorithms to a range of benchmarks. The results highlight the possibilities and boundaries of our novel algorithms.

show abstract

Section: Related Workmentioning

confidence: 99%

“…We omit the (single) sensor state for conciseness 5. To avoid growth, one may use fixed-precision numbers that over-approximate the probability of being in any state-inducing a growing (but conservative) error.…”

mentioning

confidence: 99%

Runtime Monitors for Markov Decision Processes

Junges

Torfah

Seshia

2021

Computer Aided Verification

View full text Add to dashboard Cite

show abstract

“…For future work, we plan to investigate the application of online shielding in other settings, such as decision making in robotics and control. Another interesting extension would be to incorporate quantitative performance measures in the form of rewards and costs into the computation of the online shield, as previously demonstrated in an offline manner [4] and in a hybrid approach [29], where runtime information was used to learn the environment dynamics.…”

Section: Discussionmentioning

confidence: 99%

“…Shields are usually constructed offline by computing a maximally permissive policy containing all actions that will not violate the safety specification. Several extensions exist [4,6,29,39]. The shielding approach has been shown to be successful in combination with RL [2,21].…”

Section: Related Workmentioning

confidence: 99%

“…Therefore, the complexity of offline shield synthesis grows exponentially in the state and action dimension, which limits the application of offline shielding to small environments. Previous work that applied shields in complex, high-dimensional environments relied on over-approximations of the reachable states and domain-oriented abstractions [2,4]. However, this may result in imprecise safety computations of the shield.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Online Shielding for Stochastic Systems

Könighofer

Rudolf

Palmisano

et al. 2021

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

We propose a method to develop trustworthy reinforcement learning systems. To ensure safety especially during exploration, we automatically synthesize a correct-by-construction runtime enforcer, called a shield, that blocks all actions of the agent that are unsafe with respect to a temporal logic specification. Our main contribution is a new synthesis algorithm for computing the shield online. Existing offline shielding approaches compute exhaustively the safety of all states-action combinations ahead-of-time, resulting in huge computation times, large memory consumption, and significant delays at runtime due to the look-ups in huge databases. The intuition behind online shielding is to compute at runtime the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our proposed method is general and can be applied to a wide range of planning problems with stochastic behaviour. For our evaluation, we selected a 2player version of the classical computer game Snake. The game requires fast decisions and the multiplayer setting induces a large state space, computationally expensive to analyze exhaustively. The safety objective of collision avoidance is easily transferable to a variety of planning tasks.

show abstract

Augmenting Deep Neural Networks with Scenario-Based Guard Rules

Katz

2021

Communications in Computer and Information Science

View full text Add to dashboard Cite

Run-Time Optimization for Learned Controllers Through Quantitative Games

Cited by 35 publications

References 35 publications

Runtime Monitors for Markov Decision Processes

Runtime Monitors for Markov Decision Processes

Online Shielding for Stochastic Systems

Augmenting Deep Neural Networks with Scenario-Based Guard Rules

Contact Info

Product

Resources

About