2023
DOI: 10.1016/j.artint.2023.103989
|View full text |Cite
|
Sign up to set email alerts
|

Learning reward machines: A study in partially observable reinforcement learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 36 publications
0
1
0
Order By: Relevance
“…We showed that these are less efficient than our approach by benchmarking against the only available codebase that learnt an automaton from our traces. Meanwhile, Icarte et al [32] use Tabu search, which relies on already knowing a partial model, and Furelos-Blanco et al [12] use an Answer Set Programming algorithm, which assumes a known upper bound for the maximum finite distance between TA states. Our assumptions are weaker than these as we are only required to guess an upper bound on the number of TA states (in Section 4, we explained why this is easy to do) and we do not require any a priori knowledge about the spatial MDP.…”
Section: Related Researchmentioning
confidence: 99%
“…We showed that these are less efficient than our approach by benchmarking against the only available codebase that learnt an automaton from our traces. Meanwhile, Icarte et al [32] use Tabu search, which relies on already knowing a partial model, and Furelos-Blanco et al [12] use an Answer Set Programming algorithm, which assumes a known upper bound for the maximum finite distance between TA states. Our assumptions are weaker than these as we are only required to guess an upper bound on the number of TA states (in Section 4, we explained why this is easy to do) and we do not require any a priori knowledge about the spatial MDP.…”
Section: Related Researchmentioning
confidence: 99%
“…There have been several proposals for the automated generation of RMs. For example, in [19,20] an RM is learned by an agent through experience in the environment. Closer to our work is [4], where an RM for a single agent is generated using LTL and other logics that are equivalent to regular languages, and [9] where a single-agent RM is generated from a sequential or a partial order plan.…”
Section: Related Workmentioning
confidence: 99%
“…Reward machines (RMs) [4,18,19] have recently been proposed as a way of specifying rewards for reinforcement learning (RL) agents. RMs are Mealy machines used to specify tasks and rewards based on a high-level abstraction of the agent's environment.…”
Section: Introductionmentioning
confidence: 99%