Reinforcement Learning for Autonomous Defence in Software-Defined Networking

Han, Yi; Rubinstein, Benjamin I. P.; Abraham, Tamas; Alpcan, Tansu; Vel, Olivier De; Erfani, Sarah M.; Hubczenko, David; Leckie, Christopher; Montague, Paul

doi:10.48550/arxiv.1808.05770

Cited by 1 publication

(6 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The literature on test-time attacks on DRL that are not based on adversarial examples is still very scarce. For instance, Han et al [55] investigate the case of a DRL agent in a Software Defined Network (SDN), tasked with preventing the propagation of a malware in the network by identifying the compromised nodes and deciding on taking one of the following actions at each time step: isolating and patching a node, reconnecting a node and its links, migrating the critical server, and taking no action. The reward value for this agent depends on whether the critical servers are compromised and the number of reachable nodes from such servers, as well as the number of compromised nodes, and the cost of migration.…”

Section: Test-time Attacksmentioning

confidence: 99%

“…It is also assumed that the detection mechanism of the agent can be manipulated by the adversary (i.e., the adversary can induce False Positive (FP) or False Negative (FN) results in the detector), but is constrained by a threshold on how many such manipulations can be implemented at each time step. The test-time attacks proposed in [55] are two-fold: indiscriminate attacks aim to prevent the DRL agent from taking the optimal action a t at time t, and targeted attacks aim to force the agent into taking a specific action a t at time t. Considering DDQN and A3C as DRL algorithms for the target agent, the objective for targeting DDQN agents is to maximize Q(s t + δ t , a t ) for action a t at state s t using perturbation δ t . Similarly, the objective for targeting A3C is to maximize π(a t |s t + δ t ) for the stochastic policy π.…”

Section: Test-time Attacksmentioning

confidence: 99%

“…Similarly, the objective for targeting A3C is to maximize π(a t |s t + δ t ) for the stochastic policy π. For these attacks, [55] develops a whitebox attack methodology, where the attacker can access the target's model. This attack requires the computation of those nodes whose FP or FN detection would facilitate the adversarial objective.…”

Section: Test-time Attacksmentioning

confidence: 99%

“…This attack requires the computation of those nodes whose FP or FN detection would facilitate the adversarial objective. Accordingly, [55] proposes an integer programming approach to deriving the set of such nodes at each time step. The authors also propose a blackbox attack technique, which is based on training surrogate models of the target with either the same or different hyperparameters and then following the procedure of the whitebox attacks.…”

Section: Test-time Attacksmentioning

confidence: 99%

“…Similar to the case of test-time attacks, the body of work on training-time attacks that are not based on adversarial examples is very thin. In the previously discussed paper by Han et al [55], the target model is considered to be an online learner, and hence the authors investigate attacks that aim to manipulate the training phase of the target DRL. To this end, [55] presents a poisoning attack based on flipping the reward signs, with the goal of maximizing the loss function in the target DDQN agent.…”

Section: Training-time Attacksmentioning

confidence: 99%

See 4 more Smart Citations

The Faults in Our Pi Stars: Security Issues and Open Challenges in Deep Reinforcement Learning

Behzadan,

Munir

2018

Preprint

View full text Add to dashboard Cite

Since the inception of Deep Reinforcement Learning (DRL) algorithms, there has been a growing interest in both research and industrial communities in the promising potentials of this paradigm. The list of current and envisioned applications of deep RL ranges from autonomous navigation and robotics to control applications in the critical infrastructure, air traffic control, defense technologies, and cybersecurity. While the landscape of opportunities and the advantages of deep RL algorithms are justifiably vast, the security risks and issues in such algorithms remain largely unexplored. To facilitate and motivate further research on these critical challenges, this paper presents a foundational treatment of the security problem in DRL. We formulate the security requirements of DRL, and provide a high-level threat model through the classification and identification of vulnerabilities, attack vectors, and adversarial capabilities. Furthermore, we present a review of current literature on security of deep RL from both offensive and defensive perspectives. Lastly, we enumerate critical research venues and open problems in mitigation and prevention of intentional attacks against deep RL as a roadmap for further research in this area.

show abstract

Section: Test-time Attacksmentioning

confidence: 99%

Section: Test-time Attacksmentioning

confidence: 99%

Section: Test-time Attacksmentioning

confidence: 99%

Section: Test-time Attacksmentioning

confidence: 99%

Section: Training-time Attacksmentioning

confidence: 99%

See 3 more Smart Citations

The Faults in Our Pi Stars: Security Issues and Open Challenges in Deep Reinforcement Learning

Behzadan,

Munir

2018

Preprint

View full text Add to dashboard Cite

show abstract

Reinforcement Learning for Autonomous Defence in Software-Defined Networking

Cited by 1 publication

References 27 publications

The Faults in Our Pi Stars: Security Issues and Open Challenges in Deep Reinforcement Learning

The Faults in Our Pi Stars: Security Issues and Open Challenges in Deep Reinforcement Learning

Contact Info

Product

Resources

About