Cyberspace attack and defense game based on reward randomization reinforcement learning

Zhang, Lei; Li, Hongmei; Pan, Yu; Zheng, Qibin; Li, Wei; Liu, Yi

doi:10.1016/j.array.2022.100262

Cited by 5 publications

(10 citation statements)

References 45 publications

(51 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…7: A mixed threshold strategy where σ( θ(1) l ) is the threshold (0.5 in this example); the x-axis indicates the defender's belief state b(1) ∈ [0, 1] and the y-axis indicates the probability prescribed by π1, θ (1) to the stop action S. (32). Similarly, the attacker's best response strategy π2 is parameterized with the vector θ(2) ∈ R 2L (33).…”

Section: Our Self-play Algorithm: T-fpmentioning

confidence: 99%

“…Although a growing body of work uses reinforcement learning and game theory to find intrusion response strategies (see Section VII), a direct comparison between the defender strategies learned in our framework and those found in previous work is not feasible for two reasons. First, nearly all of the prior works have developed defender strategies for custom simulations [8], [10], [10], [11], [18]- [27], [33]- [35], [38], [39], [60], [62], [63], [67]- [72], [107]- [119] and there is no obvious way to map their solutions to an emulated environment like ours (see Fig. 1 and Appendix C).…”

Section: A Learning Equilibrium Strategies Through Self-playmentioning

confidence: 99%

“…A large number of studies have focused on applying reinforcement learning to use cases similar to the intrusion response use case we discuss in this paper [9]- [11], [17]- [52], [64], [72]. These works use a variety of models, including MDPs [20], [23], [25], [26], [31], [34], [36], [42], [51], [52], [64], Stochastic games [10], [18], [28], [33], [45], [72], attack graphs [35], Petri nets [43], and POMDPs [9], [11], [21], [27], as well as various reinforcement learning algorithms, including Q-learning [18], [20], [23], [40], [43], [48], [64], [69], SARSA [21], PPO [10], [11], [34], [35], [37], hierarchical reinforcement learning [25], DQN [26], [36]-…”

Section: Reinforcement Learning For Automated Intrusion Responsementioning

confidence: 99%

“…Third, our method to find effective defender strategies includes using an emulation system in addition to a simulation system. The advantage of our method compared to the simulation-only approaches [10], [11], [18]- [27], [33]- [35], [38], [39], [44]- [46], [50], [52], [52], [64], [69], [70], [72] is that the parameters of our simulation system are determined by measurements from an emulation system instead of being chosen by a human expert. Further, the learned strategies are evaluated in the emulation system, not in the simulation system.…”

Section: Reinforcement Learning For Automated Intrusion Responsementioning

confidence: 99%

“…Third, we provide evaluation results from an emulated infrastructure. This addresses a drawback in related research that relies solely on simulations to learn and evaluate strategies [8], [10], [11], [18]- [27], [33]- [35], [38], [39], [43]- [46], [50], [52], [54], [60], [62]- [72].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Learning Near-Optimal Intrusion Responses Against Dynamic Attackers

Hammar¹,

Stadler²

2023

Preprint

View full text Add to dashboard Cite

We study automated intrusion response and formulate the interaction between an attacker and a defender as an optimal stopping game where attack and defense strategies evolve through reinforcement learning and self-play. The gametheoretic modeling enables us to find defender strategies that are effective against a dynamic attacker, i.e. an attacker that adapts its strategy in response to the defender strategy. Further, the optimal stopping formulation allows us to prove that optimal strategies have threshold properties. To obtain nearoptimal defender strategies, we develop Threshold Fictitious Self-Play (T-FP), a fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that T-FP outperforms a state-of-the-art algorithm for our use case. The experimental part of this investigation includes two systems: a simulation system where defender strategies are incrementally learned and an emulation system where statistics are collected that drive simulation runs and where learned strategies are evaluated. We argue that this approach can produce effective defender strategies for a practical IT infrastructure.

show abstract

Section: Our Self-play Algorithm: T-fpmentioning

confidence: 99%

Section: A Learning Equilibrium Strategies Through Self-playmentioning

confidence: 99%

Section: Reinforcement Learning For Automated Intrusion Responsementioning

confidence: 99%

Section: Reinforcement Learning For Automated Intrusion Responsementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Learning Near-Optimal Intrusion Responses Against Dynamic Attackers

Hammar¹,

Stadler²

2023

Preprint

View full text Add to dashboard Cite

show abstract

Applying artificial intelligence to optimize the trawling path and operational parameters for Antarctic krill

Liu,

Zhou,

Wan

et al. 2023

Ocean Engineering

View full text Add to dashboard Cite

Learning Near-Optimal Intrusion Responses Against Dynamic Attackers

Hammar

Stadler

2024

IEEE Trans. Netw. Serv. Manage.

View full text Add to dashboard Cite

We study automated intrusion response and formulate the interaction between an attacker and a defender as an optimal stopping game where attack and defense strategies evolve through reinforcement learning and self-play. The gametheoretic modeling enables us to find defender strategies that are effective against a dynamic attacker, i.e. an attacker that adapts its strategy in response to the defender strategy. Further, the optimal stopping formulation allows us to prove that best response strategies have threshold properties. To obtain nearoptimal defender strategies, we develop Threshold Fictitious Self-Play (T-FP), a fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that T-FP outperforms a state-of-the-art algorithm for our use case. The experimental part of this investigation includes two systems: a simulation system where defender strategies are incrementally learned and an emulation system where statistics are collected that drive simulation runs and where learned strategies are evaluated. We argue that this approach can produce effective defender strategies for a practical IT infrastructure.

show abstract

Cyberspace attack and defense game based on reward randomization reinforcement learning

Cited by 5 publications

References 45 publications

Learning Near-Optimal Intrusion Responses Against Dynamic Attackers

Learning Near-Optimal Intrusion Responses Against Dynamic Attackers

Applying artificial intelligence to optimize the trawling path and operational parameters for Antarctic krill

Learning Near-Optimal Intrusion Responses Against Dynamic Attackers

Contact Info

Product

Resources

About