MERLIN -- Malware Evasion with Reinforcement LearnINg

Quertier, Tony; Marais, Benjamin; Morucci, Stéphane; Fournel, Bertrand

doi:10.48550/arxiv.2203.12980

Cited by 3 publications

(5 citation statements)

References 14 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast with other approaches [22,23], we do not verify the functionality preservation of generated AEs, but we propose validating each modification individually before the generation process. Therefore, our approach is more timeefficient as it does not require discarding nonfunctional AEs during or at the end of the generation procedure.…”

Section: Validity Of Pe File Modificationsmentioning

confidence: 92%

“…Quertier et al in [23] used reinforcement learning algorithms to attack MalConv, GBDT by EMBER and Grayscale (convolutional neural network interpreting PE binaries as images) classifiers in grey-scale settings with available prediction scores for learning. Further, the authors targeted commercial AV in a pure black-box environment as well.…”

Section: Reinforcement Learning-based Attacksmentioning

confidence: 99%

See 1 more Smart Citation

Creating valid adversarial examples of malware

Kozák,

Jureček,

Stamp

et al. 2024

J Comput Virol Hack Tech

View full text Add to dashboard Cite

Because of its world-class results, machine learning (ML) is becoming increasingly popular as a go-to solution for many tasks. As a result, antivirus developers are incorporating ML models into their toolchains. While these models improve malware detection capabilities, they also carry the disadvantage of being susceptible to adversarial attacks. Although this vulnerability has been demonstrated for many models in white-box settings, a black-box scenario is more applicable in practice for the domain of malware detection. We present a method of creating adversarial malware examples using reinforcement learning algorithms. The reinforcement learning agents utilize a set of functionality-preserving modifications, thus creating valid adversarial examples. Using the proximal policy optimization (PPO) algorithm, we achieved an evasion rate of 53.84% against the gradient-boosted decision tree (GBDT) detector. The PPO agent previously trained against the GBDT classifier scored an evasion rate of 11.41% against the neural network-based classifier MalConv and an average evasion rate of 2.31% against top antivirus programs. Furthermore, we discovered that random application of our functionality-preserving portable executable modifications successfully evades leading antivirus engines, with an average evasion rate of 11.65%. These findings indicate that ML-based models used in malware detection systems are sensitive to adversarial attacks and that better safeguards need to be taken to protect these systems.

show abstract

Section: Validity Of Pe File Modificationsmentioning

confidence: 92%

Section: Reinforcement Learning-based Attacksmentioning

confidence: 99%

Creating valid adversarial examples of malware

Kozák,

Jureček,

Stamp

et al. 2024

J Comput Virol Hack Tech

View full text Add to dashboard Cite

show abstract

“…Their code transformation process first define a set of actions performed on Windows PE header such as insert overlay bytes, packing and unpacking. MERLIN [14] and Pesidious [15] used actions techniques optimized by Reinforcement Learning algorithms to write agents that learn to manipulate PE files based on a reward provided by taking specific manipulation actions. Their code manipulation process first define a set of actions performed on Windows portable executable(PE) header such as insert overlay bytes, packing and unpacking, etc.…”

Section: Code Transformation Actionsmentioning

confidence: 99%

“…This characteristic almost makes the payload feature-space impracticable to discover an approximate or exact function that is differentiable [8][9][10][11][12][13]. Initial observations from literature [5,8,9,[12][13][14][15][16], point out that code transformation actions such as; appending semantic nop no instructions, insertion of jump instructions and replace existing instructions, when applied on a software or an execuatble file can obfuscate the file against pirating or lower the file's true positive rate. In this work, we enhanced these aforementioned code transformation actions with Dynamic Programming based search method-a reinforcement learning algorithm, to increase their evasive potency against static malware scanners whiles satisfying the behavior preserving criteria.…”

Section: Introductionmentioning

confidence: 99%

Dynamic Programming-based Adversarial Windows Payload Generator

Kingful,

Ahene,

Appiah

et al. 2023

Preprint

View full text Add to dashboard Cite

This work presents a behavior preserving Adversarial payload framework against static Windows malware scanners.The framework uses Dynamic Programming to decide on the sequence of static code transformation actions to transform a Windows payload to its adversarial state. In an empirical evaluation with Windows payloads from Metasploit Framework in a black-box settings, static machine learning based and majority of commercial antivirus scanners can still be evaded by these transformations. The potency of these generated Adversarial payload capable of breaching commercial antivirus on users’ devices was demonstrated. The experimental results show a generated Adversarial Backdoor Trojan evade static and also evade its offline dynamic detector and establish a backdoor on the users’ device.

show abstract

“…However, there is still very little work to directly apply reinforcement learning methods to malware detection. Although there are some works 7 , in order to further improve the detection efficiency and intelligence, consider adopting the solution in this article. Although the internal program structure of the malware itself is different, however, its malicious behavior must eventually be implemented into the actual dynamic behavior.…”

Section: Introductionmentioning

confidence: 99%

Malware behavior detection method based on reinforcement learning

Cui

Leng

Wang

et al. 2023

International Conference on Computer Application and Information Security (ICCAIS 2022)

View full text Add to dashboard Cite

Malware in the network environment is a serious threat to the security of industrial control systems. With the gradual increase of malware variants, it brings great challenges to the detection and security protection of industrial control system malware. The existing detection methods have limitations such as low intelligence in adaptive detection and recognition. In response to this problem, this paper designs a detection application method framework by combining the use of reinforcement learning, an advanced machine learning algorithm, around the malware objects that threaten the network security of industrial control systems. In the implementation process, according to the actual needs of malware behavior detection, fully combined with intelligent features such as sequential decision-making and dynamic feedback learning of reinforcement learning, the key application modules such as feature extraction network, policy network and classification network are discussed and designed in detail. The application experiments based on the actual malware test data set verify the effectiveness of the method in this paper, which can provide an intelligent decision-making aid for general malware behavior detection.

show abstract

MERLIN -- Malware Evasion with Reinforcement LearnINg

Cited by 3 publications

References 14 publications

Creating valid adversarial examples of malware

Creating valid adversarial examples of malware

Dynamic Programming-based Adversarial Windows Payload Generator

Malware behavior detection method based on reinforcement learning

Contact Info

Product

Resources

About