2021
DOI: 10.1609/aaai.v35i12.17306
|View full text |Cite
|
Sign up to set email alerts
|

The Sample Complexity of Teaching by Reinforcement on Q-Learning

Abstract: We study the sample complexity of teaching, termed as ``teaching dimension" (TDim) in the literature, for the teaching-by-reinforcement paradigm, where the teacher guides the student through rewards. This is distinct from the teaching-by-demonstration paradigm motivated by robotics applications, where the teacher teaches by providing demonstrations of state/action trajectories. The teaching-by-reinforcement paradigm applies to a wider range of real-world settings where a demonstration is inconvenient, but has … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
34
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(34 citation statements)
references
References 23 publications
0
34
0
Order By: Relevance
“…This section summarizes the aforementioned works and discusses how our study differs from these existing attacks. [14][15][16]. For example, Ma et al [14] poison rewards in the training set for the batch RL agent, and Zhang et al [15] study adaptive reward-poisoning attack on the Q-learning RL agent.…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…This section summarizes the aforementioned works and discusses how our study differs from these existing attacks. [14][15][16]. For example, Ma et al [14] poison rewards in the training set for the batch RL agent, and Zhang et al [15] study adaptive reward-poisoning attack on the Q-learning RL agent.…”
Section: Discussionmentioning
confidence: 99%
“…[14][15][16]. For example, Ma et al [14] poison rewards in the training set for the batch RL agent, and Zhang et al [15] study adaptive reward-poisoning attack on the Q-learning RL agent. Rakhsha et al [16] poisons either reward values or transition functions to attack RL agents performing in cyclic tasks (i.e., the tasks without termination states).…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations