2019
DOI: 10.1609/aaai.v33i01.33019939
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning under Threats

Abstract: In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. In this paper, we introduce Threatened Markov Decision Processes (TMDPs), which provide a framework to support a decision maker against a potential adversary in RL. Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. After introducing our framework and deriving theoretical results, relevant empiric… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 19 publications
(14 citation statements)
references
References 20 publications
(16 reference statements)
0
14
0
Order By: Relevance
“…Instead, the possibility to strategically act on the environmental dynamics is studied in a limited number of works only. Some approaches belonging to the planning area [12,38], some are constrained to specific forms of environment configurability [8,9,34], and others based on the curriculum learning framework [4,7]. The goal of the dissertation [18] is to provide a uniform treatment of environment configurability in its diverse aspects.…”
Section: Configurable Environmentsmentioning
confidence: 99%
“…Instead, the possibility to strategically act on the environmental dynamics is studied in a limited number of works only. Some approaches belonging to the planning area [12,38], some are constrained to specific forms of environment configurability [8,9,34], and others based on the curriculum learning framework [4,7]. The goal of the dissertation [18] is to provide a uniform treatment of environment configurability in its diverse aspects.…”
Section: Configurable Environmentsmentioning
confidence: 99%
“…The adversary can target any component of the Markov decision process (MDP). First, the adversary may choose to perturb rewards by either attacking rewards directly [15] [16] [17] or attacking other indirect parts of the RL training process [18] [19]. Second, the adversary can target the agent of RL models.…”
Section: A Adversarial Attacks In Classification Tasks and Rl Modelsmentioning
confidence: 99%
“…Subsequently, at each time step, the adversary may decide whether to add perturbations δ at the next state v by observing the current clean state. Before adding these perturbations at the next state, the adversary checks whether their attacks have exceeded the maximum attack volume, as stated in Equation (17). After the remaining attack volume is calculated, perturbations that the adversary has made will be added to the next state.…”
Section: E Reward Function and Policy Networkmentioning
confidence: 99%
“…There are several adversarial detection models for DNN classifiers applicable to DRL agents [35], [36]. Sophisticated adversarial detection models for DRL agents are also proposed in literature [37], [38].…”
Section: Defense Models For Drlmentioning
confidence: 99%