A Framework for Transforming Specifications in Reinforcement Learning

Alur, Rajeev; Bansal, Suguman; Bastani, Osbert; Jothimurugan, Kishor

doi:10.1007/978-3-031-22337-2_29

Cited by 9 publications

(7 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hence, if only the number of state/action pairs is allowed, alongside 1/ε and 1/δ, as parameters, creating a PAC learning algorithm for undiscounted, infinite-horizon properties is not possible. Specifically for LTL, this has been observed by Yang, Littman, and Carbin (2021) and Alur et al (2022). Example 1 (Intractability of LTL).…”

Section: Introductionmentioning

confidence: 74%

See 1 more Smart Citation

A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs

Perez,

Somenzi,

Trivedi

2024

AAAI

View full text Add to dashboard Cite

Linear temporal logic (LTL) and omega-regular objectives---a superset of LTL---have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.

show abstract

Section: Introductionmentioning

confidence: 74%

“…Example 1 (Intractability of LTL). Figure 1 is an example adopted from (Alur et al 2022) that shows the number of samples required to learn safety properties is dependent on some property of the transition structure. The objective in this example is to stay in the initial state s 0 forever.…”

Section: Introductionmentioning

confidence: 99%

A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs

Perez,

Somenzi,

Trivedi

2024

AAAI

View full text Add to dashboard Cite

show abstract

“…An additional approach to ensure safety in RL is through shielding, which intervenes in the agent's actions when it might violate safety constraints (Alshiekh et al 2018). Integrating formal methods, like temporal logic and Lyapunov-based techniques, into RL algorithms has emerged as a promising direction for safe RL (Hasanbeig, Abate, and Kroening 2018;Alur et al 2023;Chow et al 2018). STL Mining.…”

Section: Related Workmentioning

confidence: 99%

Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey

Liu

Halev

Liu

2021

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Reinforcement Learning (RL) algorithms have had tremendous success in simulated domains. These algorithms, however, often cannot be directly applied to physical systems, especially in cases where there are constraints to satisfy (e.g. to ensure safety or limit resource consumption). In standard RL, the agent is incentivized to explore any policy with the sole goal of maximizing reward; in the real world, however, ensuring satisfaction of certain constraints in the process is also necessary and essential. In this article, we overview existing approaches addressing constraints in model-free reinforcement learning. We model the problem of learning with constraints as a Constrained Markov Decision Process and consider two main types of constraints: cumulative and instantaneous. We summarize existing approaches and discuss their pros and cons. To evaluate policy performance under constraints, we introduce a set of standard benchmarks and metrics. We also summarize limitations of current methods and present open questions for future research.

show abstract

“…However, these analyses give reinforcement-learning algorithms for particular objectives and do not generalize to other objectives. Previous work (Alur et al 2021) gave a framework of reductions between objectives whose flavor of generality is most similar to our work; however, they did not give a condition for when an objective is PAC-learnable. To our knowledge, the PAC-learnability of the objectives in Sadigh et al (2014); Littman et al (2017); ; ; Camacho et al (2019); Jothimurugan, Alur, and Bastani (2019); are not known.…”

Section: Introductionmentioning

confidence: 99%

Computably Continuous Reinforcement-Learning Objectives are PAC-learnable

Yang¹,

Littman²,

Carbin³

2023

Preprint

View full text Add to dashboard Cite

In reinforcement learning, the classic objectives of maximizing discounted and finite-horizon cumulative rewards are PAC-learnable: There are algorithms that learn a near-optimal policy with high probability using a finite amount of samples and computation. In recent years, researchers have introduced objectives and corresponding reinforcement-learning algorithms beyond the classic cumulative rewards, such as objectives specified as linear temporal logic formulas. However, questions about the PAC-learnability of these new objectives have remained open. This work demonstrates the PAC-learnability of general reinforcement-learning objectives through sufficient conditions for PAC-learnability in two analysis settings. In particular, for the analysis that considers only sample complexity, we prove that if an objective given as an oracle is uniformly continuous, then it is PAC-learnable. Further, for the analysis that considers computational complexity, we prove that if an objective is computable, then it is PAC-learnable. In other words, if a procedure computes successive approximations of the objective's value, then the objective is PAC-learnable. We give three applications of our condition on objectives from the literature with previously unknown PAC-learnability and prove that these objectives are PAC-learnable. Overall, our result helps verify existing objectives' PAC-learnability. Also, as some studied objectives that are not uniformly continuous have been shown to be not PAC-learnable, our results could guide the design of new PAC-learnable objectives.

show abstract

A Framework for Transforming Specifications in Reinforcement Learning

Cited by 9 publications

References 24 publications

A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs

A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs

Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey

Computably Continuous Reinforcement-Learning Objectives are PAC-learnable

Contact Info

Product

Resources

About