2020
DOI: 10.1609/icaps.v30i1.6756
|View full text |Cite
|
Sign up to set email alerts
|

Joint Inference of Reward Machines and Policies for Reinforcement Learning

Abstract: Incorporating high-level knowledge is an effective way to expedite reinforcement learning (RL), especially for complex tasks with sparse rewards. We investigate an RL problem where the high-level knowledge is in the form of reward machines, a type of Mealy machines that encode non-Markovian reward functions. We focus on a setting in which this knowledge is a priori not available to the learning agent. We develop an iterative algorithm that performs joint inference of reward machines and policies for RL (more s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
20
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 30 publications
(20 citation statements)
references
References 25 publications
0
20
0
Order By: Relevance
“…Task 3 is the Minecraft-like gridworld introduced in (Andreas, Klein, and Levine 2017), where the objective is to build a spear by gathering wood, string, and stone in any order, then reaching a workbench (See Figure 3 in (Andreas, Klein, and Levine 2017)). We evaluate the time and sample efficiency of our non-Markovian planning method compared to the JIRP and AFRAI methods presented in (Xu et al 2020) and (Xu et al 2021), respectively. A sample in our method includes the queries to the model during learning and the environment samples observed during execution.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Task 3 is the Minecraft-like gridworld introduced in (Andreas, Klein, and Levine 2017), where the objective is to build a spear by gathering wood, string, and stone in any order, then reaching a workbench (See Figure 3 in (Andreas, Klein, and Levine 2017)). We evaluate the time and sample efficiency of our non-Markovian planning method compared to the JIRP and AFRAI methods presented in (Xu et al 2020) and (Xu et al 2021), respectively. A sample in our method includes the queries to the model during learning and the environment samples observed during execution.…”
Section: Methodsmentioning
confidence: 99%
“…Active inference allows for the system under learning to be queried, whereas its passive counterpart leverages an existing static repository of samples. The work in (Xu et al 2020) uses passive grammatical inference by storing traces of behavior that arise in the standard Q-learning methodology. This repository can then be used to synthesize a reward machine using techniques like satisfiability solving.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Sarathy et al (2021) incorporates RL with symbolic planning models to learn new operators -similar to our subtasksto aid in the completion of planning objectives. Meanwhile, Toro Icarte et al (2018; Xu et al (2020);Toro Icarte et al (2022) use reward machines, finite-state machines encoding temporally extended tasks in terms of atomic propositions, to break tasks into stages for which separate policies can be learned. Neary et al (2021b) extends the use of reward machines to the multi-agent RL setting, decomposing team tasks into subtasks for individual learners.…”
Section: Related Workmentioning
confidence: 99%
“…While much progress has been made on learning and leveraging reward machines for decision processes with non-Markovian rewards (Xu et al 2020(Xu et al , 2021Abadi and Brafman 2020;Gaon and Brafman 2020;Neider et al 2021;Rens et al 2021), the more general setting where rewards exhibit both non-Markovian and stochastic dynamics has not been addressed. In this paper, we make progress on this front by introducing probabilistic reward machines (PRMs).…”
Section: Introductionmentioning
confidence: 99%