Model-Free Reinforcement Learning for Branching Markov Decision Processes

Hahn, Ernst Moritz; Perez, Mateo; Schewe, Sven; Somenzi, Fabio; Trivedi, Ashutosh; Wojtczak, Dominik

doi:10.1007/978-3-030-81688-9_30

Cited by 2 publications

(3 citation statements)

References 25 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Due to this step, Mungojerrie has been connected to external linear program solvers. This enabled the extension of Mungojerrie to compute reward maximizing policies via a linear program for branching Markov decision processes in [18].…”

Section: Tool Designmentioning

confidence: 99%

See 1 more Smart Citation

Mungojerrie: Linear-Time Objectives in Model-Free Reinforcement Learning

Hahn

Perez

Schewe

et al. 2023

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Mungojerrie is an extensible tool that provides a framework to translate linear-time objectives into reward for reinforcement learning (RL). The tool provides convergent RL algorithms for stochastic games, reference implementations of existing reward translations for $$\omega $$ ω -regular objectives, and an internal probabilistic model checker for $$\omega $$ ω -regular objectives. This functionality is modular and operates on shared data structures, which enables fast development of new translation techniques. Mungojerrie supports finite models specified in PRISM and $$\omega $$ ω -automata specified in the HOA format, with an integrated command line interface to external linear temporal logic translators. Mungojerrie is distributed with a set of benchmarks for $$\omega $$ ω -regular objectives in RL.

show abstract

Section: Tool Designmentioning

confidence: 99%

“…We also refer readers to [26,Fig. 3] which examined RL for scLTL properties, [6] for continuous-time MDPs, and [18], which extended Mungojerrie to test model-free reinforcement learning in branching Markov decision processes.…”

Section: Case Studiesmentioning

confidence: 99%

Mungojerrie: Linear-Time Objectives in Model-Free Reinforcement Learning

Hahn

Perez

Schewe

et al. 2023

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…Being able to handle games also paves the way for using alternating automata (so long as they are good-for-MDPs) for ordinary MDPs, which has proven to allow for efficient translations from deterministic Streett to alternating Büchi automata that are good-for-MDPs, while their translation to nondeterministic Büchi automata (GFM or not) is expensive [9].…”

Section: Related Workmentioning

confidence: 99%

Reinforcement Learning with Guarantees that Hold for Ever

Hahn

Perez²,

Schewe³

et al. 2022

Formal Methods for Industrial Critical Systems

Self Cite

View full text Add to dashboard Cite

Reinforcement learning is a successful explore-and-exploit approach, where a controller tries to learn how to navigate an unknown environment. The principle approach is for an intelligent agent to learn how to maximise expected rewards. But what happens if the objective refers to non-terminating systems? We can obviously not wait until an infinite amount of time has passed, assess the success, and update. But what can we do? This talk will tell.

show abstract

Model-Free Reinforcement Learning for Branching Markov Decision Processes

Cited by 2 publications

References 25 publications

Mungojerrie: Linear-Time Objectives in Model-Free Reinforcement Learning

Mungojerrie: Linear-Time Objectives in Model-Free Reinforcement Learning

Reinforcement Learning with Guarantees that Hold for Ever

Contact Info

Product

Resources

About