Programmatically Interpretable Reinforcement Learning

Verma, Abhishek; Murali, Vijayaraghavan; Singh, Rishabh; Kohli, Pushmeet; Chaudhuri, Swarat

doi:10.48550/arxiv.1804.02477

Cited by 21 publications

(27 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Discovering programmatic structure from observations is a challenging problem that has been studied in the domains of grammar inference [9], program synthesis [14], programming by demonstration [4] and end-user programming [25]. Learning and working with programs not only enables model checking and interpretation of black box policies [42,33], but also results in powerful abstractions allowing knowledge transfer to novel environments.…”

Section: Related Workmentioning

confidence: 99%

From Explanation to Synthesis: Compositional Program Induction for Learning from Demonstration

Burke¹,

Penkov²,

Ramamoorthy³

2019

Robotics: Science and Systems XV

View full text Add to dashboard Cite

Hybrid systems are a compact and natural mechanism with which to address problems in robotics. This work introduces an approach to learning hybrid systems from demonstrations, with an emphasis on extracting models that are explicitly verifiable and easily interpreted by robot operators. We fit a sequence of controllers using sequential importance sampling under a generative switching proportional controller task model. Here, we parameterise controllers using a proportional gain and a visually verifiable joint angle goal. Inference under this model is challenging, but we address this by introducing an attribution prior extracted from a neural end-to-end visuomotor control model. Given the sequence of controllers comprising a task, we simplify the trace using grammar parsing strategies, taking advantage of the sequence compositionality, before grounding the controllers by training perception networks to predict goals given images. Using this approach, we are successfully able to induce a program for a visuomotor reaching task involving loops and conditionals from a single demonstration and a neural endto-end model. In addition, we are able to discover the program used for a tower building task. We argue that computer programlike control systems are more interpretable than alternative endto-end learning approaches, and that hybrid systems inherently allow for better generalisation across task configurations.

show abstract

Section: Related Workmentioning

confidence: 99%

From Explanation to Synthesis: Compositional Program Induction for Learning from Demonstration

Burke¹,

Penkov²,

Ramamoorthy³

2019

Robotics: Science and Systems XV

View full text Add to dashboard Cite

show abstract

“…Consequently, such an approach cannot be utilized in safety critical domains that require online learning. This shortcoming is shared by other works [12,37], each requiring learning a model that is based on a deep neural net prior to utilizing an interpretable controller.…”

Section: Related Workmentioning

confidence: 99%

Learning an Interpretable Traffic Signal Control Policy

Ault¹,

Hanna²,

Sharon³

2019

Preprint

View full text Add to dashboard Cite

Signalized intersections are managed by controllers that assign right of way (green, yellow, and red lights) to non-conflicting directions. Optimizing the actuation policy of such controllers is expected to alleviate traffic congestion and its adverse impact. Given such a safety-critical domain, the affiliated actuation policy is required to be interpretable in a way that can be understood and regulated by a human. This paper presents and analyzes several on-line optimization techniques for tuning interpretable control functions. Although these techniques are defined in a general way, this paper assumes a specific class of interpretable control functions (polynomial functions) for analysis purposes. We show that such an interpretable policy function can be as effective as a deep neural network for approximating an optimized signal actuation policy. We present empirical evidence that supports the use of value-based reinforcement learning for on-line training of the control function. Specifically, we present and study three variants of the Deep Q-learning algorithm that allow the training of an interpretable policy function. Our Deep Regulatable Hardmax Q-learning variant is shown to be particularly effective in optimizing our interpretable actuation policy, resulting in up to 19.4% reduced vehicles delay compared to commonly deployed actuated signal controllers.

show abstract

“…Our problem is different in that we follow another type of constraint, yet similar methods might be applied. Using a domain-specific programming language instead of neural networks can be an alternative method to add interpretability [26], but it lacks the numerous advantages inherent in end-to-end and differentiable learning. In an alternative direction, it is also possible to manipulate the policy shape by introducing auxiliary tasks or reward shaping [27].…”

Section: Related Workmentioning

confidence: 99%

Don't Forget Your Teacher: A Corrective Reinforcement Learning Framework

Nazari¹,

Jahani²,

Snyder³

et al. 2019

Preprint

View full text Add to dashboard Cite

Although reinforcement learning (RL) can provide reliable solutions in many settings, practitioners are often wary of the discrepancies between the RL solution and their status quo procedures. Therefore, they may be reluctant to adapt to the novel way of executing tasks proposed by RL. On the other hand, many realworld problems require relatively small adjustments from the status quo policies to achieve improved performance. Therefore, we propose a student-teacher RL mechanism in which the RL (the "student") learns to maximize its reward, subject to a constraint that bounds the difference between the RL policy and the "teacher" policy. The teacher can be another RL policy (e.g., trained under a slightly different setting), the status quo policy, or any other exogenous policy. We formulate this problem using a stochastic optimization model and solve it using a primal-dual policy gradient algorithm. We prove that the policy is asymptotically optimal. However, a naive implementation suffers from high variance and convergence to a stochastic optimal policy. With a few practical adjustments to address these issues, our numerical experiments confirm the effectiveness of our proposed method in multiple GridWorld scenarios.Preprint. Under review.

show abstract

Programmatically Interpretable Reinforcement Learning

Cited by 21 publications

References 18 publications

From Explanation to Synthesis: Compositional Program Induction for Learning from Demonstration

From Explanation to Synthesis: Compositional Program Induction for Learning from Demonstration

Learning an Interpretable Traffic Signal Control Policy

Don't Forget Your Teacher: A Corrective Reinforcement Learning Framework

Contact Info

Product

Resources

About