2018
DOI: 10.48550/arxiv.1808.05770
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Reinforcement Learning for Autonomous Defence in Software-Defined Networking

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
6
0

Year Published

2018
2018
2018
2018

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(6 citation statements)
references
References 27 publications
0
6
0
Order By: Relevance
“…The literature on test-time attacks on DRL that are not based on adversarial examples is still very scarce. For instance, Han et al [55] investigate the case of a DRL agent in a Software Defined Network (SDN), tasked with preventing the propagation of a malware in the network by identifying the compromised nodes and deciding on taking one of the following actions at each time step: isolating and patching a node, reconnecting a node and its links, migrating the critical server, and taking no action. The reward value for this agent depends on whether the critical servers are compromised and the number of reachable nodes from such servers, as well as the number of compromised nodes, and the cost of migration.…”
Section: Test-time Attacksmentioning
confidence: 99%
See 4 more Smart Citations
“…The literature on test-time attacks on DRL that are not based on adversarial examples is still very scarce. For instance, Han et al [55] investigate the case of a DRL agent in a Software Defined Network (SDN), tasked with preventing the propagation of a malware in the network by identifying the compromised nodes and deciding on taking one of the following actions at each time step: isolating and patching a node, reconnecting a node and its links, migrating the critical server, and taking no action. The reward value for this agent depends on whether the critical servers are compromised and the number of reachable nodes from such servers, as well as the number of compromised nodes, and the cost of migration.…”
Section: Test-time Attacksmentioning
confidence: 99%
“…It is also assumed that the detection mechanism of the agent can be manipulated by the adversary (i.e., the adversary can induce False Positive (FP) or False Negative (FN) results in the detector), but is constrained by a threshold on how many such manipulations can be implemented at each time step. The test-time attacks proposed in [55] are two-fold: indiscriminate attacks aim to prevent the DRL agent from taking the optimal action a t at time t, and targeted attacks aim to force the agent into taking a specific action a t at time t. Considering DDQN and A3C as DRL algorithms for the target agent, the objective for targeting DDQN agents is to maximize Q(s t + δ t , a t ) for action a t at state s t using perturbation δ t . Similarly, the objective for targeting A3C is to maximize π(a t |s t + δ t ) for the stochastic policy π.…”
Section: Test-time Attacksmentioning
confidence: 99%
See 3 more Smart Citations