Reinforcement Learning under Threats

Gallego, Víctor; Naveiro, Roi; Insua, David Rı́os

doi:10.1609/aaai.v33i01.33019939

Cited by 19 publications

(14 citation statements)

References 20 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead, the possibility to strategically act on the environmental dynamics is studied in a limited number of works only. Some approaches belonging to the planning area [12,38], some are constrained to specific forms of environment configurability [8,9,34], and others based on the curriculum learning framework [4,7]. The goal of the dissertation [18] is to provide a uniform treatment of environment configurability in its diverse aspects.…”

Section: Configurable Environmentsmentioning

confidence: 99%

Configurable Environments in Reinforcement Learning: An Overview

Metelli

2022

Special Topics in Information Technology

View full text Add to dashboard Cite

Reinforcement Learning (RL) has emerged as an effective approach to address a variety of complex control tasks. In a typical RL problem, an agent interacts with the environment by perceiving observations and performing actions, with the ultimate goal of maximizing the cumulative reward. In the traditional formulation, the environment is assumed to be a fixed entity that cannot be externally controlled. However, there exist several real-world scenarios in which the environment offers the opportunity to configure some of its parameters, with diverse effects on the agent’s learning process. In this contribution, we provide an overview of the main aspects of environment configurability. We start by introducing the formalism of the Configurable Markov Decision Processes (Conf-MDPs) and we illustrate the solutions concepts. Then, we revise the algorithms for solving the learning problem in Conf-MDPs. Finally, we present two applications of Conf-MDPs: policy space identification and control frequency adaptation.

show abstract

Section: Configurable Environmentsmentioning

confidence: 99%

Configurable Environments in Reinforcement Learning: An Overview

Metelli

2022

Special Topics in Information Technology

View full text Add to dashboard Cite

show abstract

“…The adversary can target any component of the Markov decision process (MDP). First, the adversary may choose to perturb rewards by either attacking rewards directly [15] [16] [17] or attacking other indirect parts of the RL training process [18] [19]. Second, the adversary can target the agent of RL models.…”

Section: A Adversarial Attacks In Classification Tasks and Rl Modelsmentioning

confidence: 99%

“…Subsequently, at each time step, the adversary may decide whether to add perturbations δ at the next state v by observing the current clean state. Before adding these perturbations at the next state, the adversary checks whether their attacks have exceeded the maximum attack volume, as stated in Equation (17). After the remaining attack volume is calculated, perturbations that the adversary has made will be added to the next state.…”

Section: E Reward Function and Policy Networkmentioning

confidence: 99%

Adversarial Attacks Against Reinforcement Learning-Based Portfolio Management Strategy

Chen

Sang

et al. 2021

IEEE Access

View full text Add to dashboard Cite

Many researchers have incorporated deep neural networks (DNNs) with reinforcement learning (RL) in automatic trading systems. However, such methods result in complicated algorithmic trading models with several defects, especially when a DNN model is vulnerable to malicious adversarial samples. Researches have rarely focused on planning for long-term attacks against RL-based trading systems. To neutralize these attacks, researchers must consider generating imperceptible perturbations while simultaneously reducing the number of modified steps. In this research, an adversary is used to attack an RLbased trading agent. First, we propose an extension of the ensemble of the identical independent evaluators (EIIE) method, called enhanced EIIE, in which information on the best bids and asks is incorporated. Enhanced EIIE was demonstrated to produce an authoritative trading agent that yields better portfolio performance relative to that of an EIIE agent. Enhanced EIIE was then applied to the adversarial agent for the agent to learn when and how much to attack (in the form of introducing perturbations).In our experiments, our proposed adversarial attack mechanisms were > 30% more effective at reducing accumulated portfolio value relative to the conventional attack mechanisms of the fast gradient sign method (FSGM) and iterative FSGM, which are currently more commonly researched and adapted to compare and improve.

show abstract

“…There are several adversarial detection models for DNN classifiers applicable to DRL agents [35], [36]. Sophisticated adversarial detection models for DRL agents are also proposed in literature [37], [38].…”

Section: Defense Models For Drlmentioning

confidence: 99%

Adversarial Attacks and Defense in Deep Reinforcement Learning (DRL)-Based Traffic Signal Controllers

Haydari¹,

Zhang

Chuah

2021

IEEE Open J. Intell. Transp. Syst.

View full text Add to dashboard Cite

Security attacks on intelligent transportation systems (ITS) may result in life-threatening situations. Combining deep neural networks with reinforcement learning (RL) models called DRL shows promising results when applied to urban Traffic Signal Control (TSC) for adaptive adjustment of traffic light schedules. In this paper, first, we explore the security vulnerabilities of DRL-based TSCs in the presence of adversarial attacks. We investigate the impact of the two distinct threat models with two state-of-the-art adversarial attacks using whitebox and black-box settings. The attacks are simulated on different DRL-based TSC algorithms in a single intersection and multiple intersections. The results show that the performance of the DRL learning agent decreases in both adversarial attack models with white-box and black-box settings resulting in higher levels of traffic congestion. After analysing the adversarial attack models, we explored several sequential anomaly detection models. While sequential anomaly detection models minimizes the detection delays, it also achieves lower false alarm rates due to cumulative anomaly inspection. We also proposed an ensemble model that works with all the attack models without any model assumption. The results of anomaly detectors indicates that low-cost ensemble model achieves the best anomaly detection performance in all attack models and DRL settings.

show abstract

Reinforcement Learning under Threats

Cited by 19 publications

References 20 publications

Configurable Environments in Reinforcement Learning: An Overview

Configurable Environments in Reinforcement Learning: An Overview

Adversarial Attacks Against Reinforcement Learning-Based Portfolio Management Strategy

Adversarial Attacks and Defense in Deep Reinforcement Learning (DRL)-Based Traffic Signal Controllers

Contact Info

Product

Resources

About