2018
DOI: 10.48550/arxiv.1804.02477
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Programmatically Interpretable Reinforcement Learning

Abstract: We present a reinforcement learning framework, called Programmatically Interpretable Reinforcement Learning (PIRL), that is designed to generate interpretable and verifiable agent policies. Unlike the popular Deep Reinforcement Learning (DRL) paradigm, which represents policies by neural networks, PIRL represents policies using a high-level, domain-specific programming language. Such programmatic policies have the benefits of being more easily interpreted than neural networks, and being amenable to verificatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
27
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(27 citation statements)
references
References 18 publications
0
27
0
Order By: Relevance
“…Discovering programmatic structure from observations is a challenging problem that has been studied in the domains of grammar inference [9], program synthesis [14], programming by demonstration [4] and end-user programming [25]. Learning and working with programs not only enables model checking and interpretation of black box policies [42,33], but also results in powerful abstractions allowing knowledge transfer to novel environments.…”
Section: Related Workmentioning
confidence: 99%
“…Discovering programmatic structure from observations is a challenging problem that has been studied in the domains of grammar inference [9], program synthesis [14], programming by demonstration [4] and end-user programming [25]. Learning and working with programs not only enables model checking and interpretation of black box policies [42,33], but also results in powerful abstractions allowing knowledge transfer to novel environments.…”
Section: Related Workmentioning
confidence: 99%
“…Consequently, such an approach cannot be utilized in safety critical domains that require online learning. This shortcoming is shared by other works [12,37], each requiring learning a model that is based on a deep neural net prior to utilizing an interpretable controller.…”
Section: Related Workmentioning
confidence: 99%
“…Our problem is different in that we follow another type of constraint, yet similar methods might be applied. Using a domain-specific programming language instead of neural networks can be an alternative method to add interpretability [26], but it lacks the numerous advantages inherent in end-to-end and differentiable learning. In an alternative direction, it is also possible to manipulate the policy shape by introducing auxiliary tasks or reward shaping [27].…”
Section: Related Workmentioning
confidence: 99%