2017
DOI: 10.3390/make1010002
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Teach Reinforcement Learning Agents

Abstract: Abstract:In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show that the best performers are not always the b… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 50 publications
(46 citation statements)
references
References 15 publications
0
46
0
Order By: Relevance
“…However, this requires a lot of resources and sometimes does not really work out well. One interesting approach is to use data mining methods [11,12] which, from the available data, use an analytic process to give information about a problem in the future.…”
Section: Introductionmentioning
confidence: 99%
“…However, this requires a lot of resources and sometimes does not really work out well. One interesting approach is to use data mining methods [11,12] which, from the available data, use an analytic process to give information about a problem in the future.…”
Section: Introductionmentioning
confidence: 99%
“…A simplified representation of process implemented by the Qlearning algorithm in order to control a PV system for the implementation of the GMPPT process is presented in Figure 3. In Q-learning, an agent interacts with the unknown environment (i.e., the PV system) and gains experience through a specific set of states, actions and rewards encountered during this interaction [24][25][26][27]. Q-learning strives to learn the Q-values of state-actions pairs, which represent the expected total discounted reward in the long term.…”
Section: The Proposed Q-learning-based Methods For Photovoltaic (Pv) Gmentioning
confidence: 99%
“…Typically, experience for learning is recorded in terms of samples (St, at, Rt, St+1), meaning that at some time step t, action at was executed in state St and a transition to the next state St+1 was observed, while reward Rt was received. The Q-learning update rule, given a sample (St, at, Rt, St+1) at time step t , is defined as follows: In Q-learning, an agent interacts with the unknown environment (i.e., the PV system) and gains experience through a specific set of states, actions and rewards encountered during this interaction [24][25][26][27]. Q-learning strives to learn the Q-values of state-actions pairs, which represent the expected total discounted reward in the long term.…”
Section: The Proposed Q-learning-based Methods For Photovoltaic (Pv) Gmentioning
confidence: 99%
“…• Q-Teaching Reward (QTR): The QTR advising-level reward extends Q-Teaching (Fachantidis, Taylor, and Vlahavas 2017) to MARL by using…”
Section: Contributionmentioning
confidence: 99%