Proceedings of the 13th International Conference on Agents and Artificial Intelligence 2021
DOI: 10.5220/0010212000740086
|View full text |Cite
|
Sign up to set email alerts
|

Online Learning of non-Markovian Reward Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Since then, RMs have been used for solving problems in planning (Illanes et al, 2019(Illanes et al, , 2020, robotics DeFazio & Zhang, 2021;Camacho et al, 2020Camacho et al, , 2021, multi-agent systems (Neary et al, 2021), lifelong RL (Zheng et al, 2021), and partial observability (Toro Icarte et al, 2019a also considered both Mealy and Moore versions of RMs, though theirs only output numbers (like our simple RMs) instead of reward functions. Finally, there has been prominent work on how to learn RMs from experience (e.g., Toro Icarte et al, 2019aIcarte et al, , 2019bXu et al, 2020aXu et al, , 2020bFurelos-Blanco et al, 2020aRens & Raskin, 2020;Hasanbeig et al, 2021;Velasquez et al, 2021) Since our previous work, we have gained practical experience and new theoretical insights about reward machines -which were reflected in this paper. In particular, we provided a cleaner definition of reward machines and QRM.…”
Section: Reward Machine Researchmentioning
confidence: 78%
See 1 more Smart Citation
“…Since then, RMs have been used for solving problems in planning (Illanes et al, 2019(Illanes et al, , 2020, robotics DeFazio & Zhang, 2021;Camacho et al, 2020Camacho et al, , 2021, multi-agent systems (Neary et al, 2021), lifelong RL (Zheng et al, 2021), and partial observability (Toro Icarte et al, 2019a also considered both Mealy and Moore versions of RMs, though theirs only output numbers (like our simple RMs) instead of reward functions. Finally, there has been prominent work on how to learn RMs from experience (e.g., Toro Icarte et al, 2019aIcarte et al, , 2019bXu et al, 2020aXu et al, , 2020bFurelos-Blanco et al, 2020aRens & Raskin, 2020;Hasanbeig et al, 2021;Velasquez et al, 2021) Since our previous work, we have gained practical experience and new theoretical insights about reward machines -which were reflected in this paper. In particular, we provided a cleaner definition of reward machines and QRM.…”
Section: Reward Machine Researchmentioning
confidence: 78%
“…Many questions remain open regarding reward machines. For instance, we know how to learn reward machines from experience (Toro Icarte et al, 2019aXu et al, 2020aXu et al, , 2020bFurelos-Blanco et al, 2020aRens & Raskin, 2020), but all these methods assume access to a correct labelling function. How to learn RMs and a labelling function at the same time remains unknown.…”
Section: Discussionmentioning
confidence: 99%
“…Finally, we note that different approaches to learn RMs were proposed simultaneously, or shortly after, our original publication (e.g., Xu et al, 2020a,b;Furelos-Blanco et al, 2020;Rens et al, 2020;Gaon and Brafman, 2020;Memarian et al, 2020;Neider et al, 2021;Hasanbeig et al, 2021). They all learn reward machines in fully observable domains.…”
Section: Related Workmentioning
confidence: 95%
“…These include methods that learn reward machines using a SAT solver (Xu et al, 2020a;Neider et al, 2021), use inductive logic programming (Furelos-Blanco et al, 2020), and by using program synthesis (Hasanbeig et al, 2021). There has also been work on adapting the L * algorithm (Angluin, 1987) to learn RMs given the model of the MDP (Rens et al, 2020), expert demonstrations (Memarian et al, 2020), or in a pure RL setting (Gaon and Brafman, 2020;Xu et al, 2020b).…”
Section: Related Workmentioning
confidence: 99%
“…In contrast, our maximumlikelihood approach does not a priori require any structure of the specification or the spatial MDP environment. Meanwhile, [11,13,27,31,37] use Angluin [5]'s L * algorithm to learn a TA, relying on an oracle for equivalence and membership queries. We assume that the agent cannot access an oracle and must learn the TA fully autonomously, which aligns with the standard setup of model-free RL (note that L * was not originally developed for RL applications).…”
Section: Related Researchmentioning
confidence: 99%